Gary Marcus and Ernest Davis, Has AI found a new Foundation?, The Gradient.
You may have heard that over a 100 AI researchers recently gathered at Stanford to announce the emergence of what they call "Foundation Models" as the new and reigning paradigm in AI.
"Although the term is new," Marcus and Davis explain, "the general approach is not." They elaborate:
You train a big neural network (like the well-known GPT-3) on an enormous amount of data), and then you adapt (“fine-tune”) the model to a bunch of more specific tasks (in the words of the report, "a foundation model ...[thus] serves as [part of] the common basis from which many task-specific models are built via adaptation"). The basic model thus serves as the “foundation” (hence the term) of AIs that carry out more specific tasks. The approach started to gather momentum in 2018, when Google developed the natural language processing model called BERT, and it became even more popular with the introduction last year of OpenAI’s GPT-3.
Their article is a critique of the approach, noting, "The broader AI community has had decidedly mixed reactions to the announcement from Stanford and some noted scientists have voiced skepticism or opposition." I'm sympathetic to these critiques. But critiques, but I'm not particularly interested in summarizing them here. You can real the whole article for that.
I'm interested in their brief characterization of what a proper foundation would require:
First, a general intelligence needs to maintain a cognitive model that keeps track of what it knows about the world. An AI system that powers a domestic robot must keep track of what is in the house. An AI system that reads a story or watches a movie must keep track both of the current state of people and things, and of their whole history so far.
Second, any generally intelligent system will require a great deal of real-world knowledge, and that knowledge must be accessible and reusable. A system must be able to encode a fact like “Most people in Warsaw speak Polish” and use it in the service of drawing inferences. (If Lech is from Warsaw, there is a good chance he speaks Polish; if we plan to visit him in Warsaw, we might want to learn a little Polish before we visit, etc.).
Third, a system must be able not only to identify entities (e.g., objects in a photo or video) but also be able to infer and reason about the relationships between those entities. If an AI watches a video that shows a person drinking cranberry grape juice, it must not only recognize the objects, but realize that the juices have been mixed, the mixture has been drunk, and the person has quenched their thirst.
Fourth, the notion that linguists call compositionality is similarly central; we understand wholes in terms of their parts. We understand that the phrase the woman who went up a mountain and came down with a diamond describes a particular woman. We can infer from the parts that (other things being equal) she know now possesses a diamond.
Fifth, in order to communicate with people and reason about the world a wide range of common sense knowledge that extends beyond simply factoids is required. In our view [link rebooting AI], common sense must start with a basic framework of understanding time, space, and causality that includes fundamental categories like physical objects, mental states, and interpersonal interactions.
Sixth, intelligent agents must be able to reason about what they know: if you know that a mixture of cranberry juice and grape juice is non-toxic, you can infer that drinking it is unlikely to cause you to die.
Finally, we would hope that any general intelligence would possess a capacity to represent and reason about human values. A medical advice chatbot should not recommend suicide.
In the end, it all comes down to trust. Foundation models largely try to shortcut all of the above steps. Examples like the juice case show the perils of those kinds of shortcuts. The inevitable result is systems that are untrustworthy. The initial enthusiasm for GPT-3 for example has been followed by a wave of panic as people have realized how prone these systems are to producing obscenity, prejudiced remarks, misinformation, and so forth. Large pretrained statistical models can do almost anything, at least enough for a proof of concept, but there is precious little that they can do reliably—precisely because they skirt the foundations that are actually required.
I'm not sure whether or not those requirements are adequate; I've not yet attempted to think it through. But they are a sobering reminder of what that foundationalists have yet to think through.
The whole article is worth reading.