Thursday, December 19, 2019

AI at its best, pratfalls and all [the common sense problem is the resistance that the world presents to us]

Once again Mark Liberman presents us with an example of modern AI at its best, pratfalls and all, Mrs. Transformer-XL Tittlemouse, December 18, 2019. It is about this article:
Zihang Da et al., "Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context", arXiv 6/2/2019.

Abstract: Transformers have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling. We propose a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length without disrupting temporal coherence. It consists of a segment-level recurrence mechanism and a novel positional encoding scheme. Our method not only enables capturing longer-term dependency, but also resolves the context fragmentation problem. As a result, TransformerXL learns dependency that is 80% longer than RNNs and 450% longer than vanilla Transformers, achieves better performance on both short and long sequences, and is up to 1,800+ times faster than vanilla Transformers during evaluation. Notably, we improve the state-ofthe-art results of bpc/perplexity to 0.99 on enwiki8, 1.08 on text8, 18.3 on WikiText-103, 21.8 on One Billion Word, and 54.5 on Penn Treebank (without finetuning). When trained only on WikiText-103, Transformer-XL manages to generate reasonably coherent, novel text articles with thousands of tokens. Our code, pretrained models, and hyperparameters are available in both Tensorflow and PyTorch.
Liberman presents an example of Generated Text from Appendix E:
= The Tale of Mrs. Tittlemouse =
The Tale of Mrs. Tittlemouse is a children's book written and illustrated by H. M. Potter. It was first published in February 1910 by Frederick Warne & Co. and included illustrations by Frederick Warne. Mrs. Tittlemouse was intended to be a children's book, rather than a musical, and the tale describes her attempts to find her own home. Mrs. Tittlemouse is an average kitten and the tale follows a mischievous rabbit named Mr. Tittlemouse who tries to kill her in an attempt to get a ride on a horse. Potter later wrote of the rabbit and her attempts to kill her, " If [ she ] were a fox, I should like to have been able to show her the way. " Potter's work was inspired by The Tale of Little Black Sambo and her two nieces, Petunia and Petunia. It was the first book in the Potter family to be published in a hardback format, and was also the first book in the Peter Rabbit series. The tale was released on 13 November 1910 and was accompanied by a greeting card by Potter's brother-in-law, Frederick Warne. The tale was well received by critics and children, and sold over 600,000 copies in its first few years. In the first year of its publication, Potter's The Tale of Mrs. Tittlemouse won children's book awards in England, America and Australia. It was selected as a " recommended reading " by children in the US and Canada.
= = Plot summary = =
The tale opens with the pet rabbit Mrs. Tittlemouse wandering through a forest in a small litter of four. He is shot and badly injured by a deer after she attempts to escape. Mrs. Tittlemouse decides that she wants to find her own home, because she is eager to go on her own. She goes alone to the farm where she makes a little money by selling a few seeds and building a small cabin in the woods. She is approached by a wealthy hunter named Mr. Tittlemouse, who tries to kill her but Mrs. Tittlemouse kills him by stuffing a rope into his nose and killing him. She is rescued by Mr. Tittlemouse's wife Ruth, but Mrs. Tittlemouse then leaves the woodland with the baby. When she is spotted by
Liberman introduces Transformer-XL with these words:
This is another note on the amazing ability of modern AI learning techniques to imitate some aspects of natural-language patterning almost perfectly, while managing to miss common sense almost entirely. This probably tells us something about modern AI and also about language, though we probably won't understand what it's telling us until many years in the future.
Yes.

Readers of New Savanna will know that I've been thinking something like that for a couple of years now, and have even attempted to conceptualize it – a number of posts I've labeled with "AI Limit" are of this kind, particularly Computational linguistics & NLP and Borges Redux: Computing Babel. In his speech upon accepting an award from the ACL Martin Kay (PDF) notes that contemporary AI is using statistics over word distributions as a substitute/proxy for a model of the world. I think that's right. I note as well that the problem of common sense knowledge is one of the problems that put the brakes on old-style symbolic AI. That, of course, is a problem about modeling the (surface of) the world in all it's trivial but inescapable variety. There were just so many bits of it to hand-code and, once coded, all those trivial bits exacerbated the problem of combinatorial explosion.

It is thus interesting that these new techniques, run on machines that dwarf those machines from the 1970s and 1980s, are now running into the common sense problem. It is not at all obvious to me that the problem can be solved by using ever more text as fodder and more computing power to digest that fodder. The world is just too big and too irreducibly complex to be mastered in that way.

The other side of the issue is that these statistical techniques work very well in closed domains, like chess and Go. In those domains there is hardly any world to speak of and hence there is no common sense problem. Moreover, abstractly considered, those games are finite. Given enough time and memory it would be possible to calculate every possible game and then list them all. What's interesting is that the best chess programs seem to have broken into regions of the chess space that human players had not explored, so they exhibit new styles of play.

It's as though Go and chess embody the abstract mental powers we bring to bear on the world (Chomskyian generativity? Cartesian rationality?) while the common sense problem, in effect, represents the resistance that the world presents to us. It is the world exerting its existence by daring us: "parse this, and this, and this, and...!"

No comments:

Post a Comment