Mark Seidenberg has an interesting post today, Cartoonist walks into a language lab… He poses a task for machine learning:
New Yorker cartoons are usually captioned these days, with fewer in the lovely mute style of a William Steig. A general theory of language use should be able to explain how cartoon captions, a genre of text, are understood. The cartoons illustrate (sic) the dependence of language comprehension on context (the one created by the drawing) and background knowledge (about, for example, rats running mazes, guys marooned on islands, St. Peter’s gate, corporate culture, New Yorkers). The popular Caption Contest is an image-labeling task, generating humorous labels for an incongruous scene.
A bit later:
The weekly caption contest has yielded a massive amount of data that is being analyzed using NLP and machine learning techniques. (The contest: Entrants submit captions for a cartoon; from the 5000 or so entries the editors pick three finalists; readers pick the winner by voting on-line.) Just think of the studies that can be done with this goldmine of a data set! Identify the linguistic properties that distinguish the winning captions from the two losers. Build a classifier that can estimate relative funniness from properties such as word choices, grammatical complexity, affective valence (“sentiment”), readability, structure of the joke, etc. Use the classifier to predict the winners on other weeks. Or the rated humorosity of other cartoons.
Heavy hitters from places like Microsoft, Google, Michigan, Columbia, Yale, and Yahoo have taken swings at this ... The results (from the few published studies I found) have been uninspiring.
Liberman then asks a crucial question: “Is labeling New Yorker cartoons harder than playing Go?” – you may recall that not too long ago Google’s DeepMind made a media splash when one of its programs defeated the best human expert on Go. Liberman’s answer to his question is, of course, yes, labeling cartoons is harder.
Go has a conventionalized set-up and explicit rules. A captioning model has to figure out what game is being played. Captioning is a type of scene labeling but that requires recognizing what’s in the scene which in this case is, literally, ridiculous: exaggerated, crude, eccentric, stylized renderings of the world. Quite different from the naturalistic scenes that have been the focus of so much attention in AI.
He goes on to observe:
... humor turns on a vast amount of background knowledge. That feeling when you just don’t get it happens when we either don’t know the relevant stuff or can’t figure out what’s relevant to that cartoon. A deep learning network might well acquire the requisite knowledge of the world but not from 80,000 drawings: insufficient data. Same for analyzing the captions: it’s necessary to know the language. People have acquired most of the knowledge that's required by other means.
That, common sense background knowledge, is one of the problems that squelched classic symbolic processing several decades ago. Researchers realized that in dealing with even very simple language, we draw on a vast back of common sense knowledge. How are we doing to hand-code all that into a computer? And once we've done it, what of combinatorial explosion?
According to Wikipedia Go is a combinatorial game, like Chess and Checkers, but also like Tic-tac-toe. The latter, of course, is trivial, while the former three are not. The moves in such games can be represented as a tree, where each possible game is a path from the root of the tree to some leaf. For each such game the number of such paths is necessarily finite. In the trivial case of Tic-tac-toc the number of paths is relatively small. In the other cases it is not, and the game tree for Go is by far the largest one.
The fact that each of these games is finite means that, in point of abstract principle, none of them is any more complex than Tic-tac-toe. But that’s only in principle, because the principle is indifferent to computational resources. The principle, in effect, supplies all the necessary resources.
Real computation is not like that. Real computation always takes place with finite resources. Go requires more resources than Chess.
But humor, as in the task of supplying captions for New Yorker cartoons, does not appear to be finite even in point of abstract principle. It is a beast of an altogether different kind.
It may be the case that, at any moment in time, each one of us has only a finite stock of common sense knowledge available. I haven’t got the foggiest idea of how many such bits that might be, nor does it much matter. But the collective stock of common sense knowledge must surely be unbounded for all practical purposes – assuming when can define it in some reasonable way. The field available for humor is thus unbounded as well. And that’s different from merely being very very very large, albeit finite.
What about arithmetic? Is it more like those finite games or more like common sense knowledge and humor? If my understanding of Gödel’s proof, and similar mathematical beasts, is reasonable, then arithmetic may look “simple” like Chess or Go, but it is in fact unbounded, like humor.