Monday, May 31, 2021

Geoffrey Hinton says deep learning will do everything. I’m not sure what he means, but I offer some pointers. Version 2.

This is updated from a previous version to include a passage by Sydney Lamb.

* * * * *

Late last year Geoffrey Hinton had an interview with Karen Hao [1] in which he said “I do believe deep learning is going to be able to do everything,” with the qualification that “there’s going to have to be quite a few conceptual breakthroughs.” I’m trying to figure out whether or not, to what extent, in what way I (might) agree with him.

Neural Vectors, Symbols, Reasoning, and Understanding

Hinton believes that “What’s inside the brain is these big vectors of neural activity” and that one of the breakthroughs we need is “how you get big vectors of neural activity to implement things like reason.” That will certainly require a massive increase in scale. Thus while GPT-3 has 175 billion parameters, the brain has trillion, where Hinton treats each synapse as a parameter. 

Correspondingly Hinton rejects the idea that symbolic reasoning is primitive to the nervous system (my formulation), rather “we do internal operations on big vectors.” What about language? He doesn’t address the issue directly but he does say that “symbols just exist out there in the external world.” I do think that covers language, speech sounds, written words, gestural signs, those things are out there in the external world. But the brain uses “big vectors of neural activity” to process those. 

Hinton’s remark about symbols bears comparison with a remark by Sydney Lamb: “the linguistic system is a relational network and as such does not contain lexemes or any objects at all. Rather it is a system that can produce and receive such objects. Those objects are external to the system, not within it”[2]. Lamb has come to think of his approach as neurocognitive linguistics and, while his sense of the nervous system is somewhat different from Hinton’s, they agree on this issue and, in the current intellectual climate, that agreement is of some significance. For Lamb is a first generation researcher in machine translation  and so was working when most AI research was committed to symbolic systems. We’ll return to Lamb later as I think the notation he developed is a way to bring symbolic reasoning within range of Hinton’s “big vectors of neural activity”.

But now let’s return to Hinton with a passage from an article he co-authored with Yann LeCun and Yoshua Bengio [3]:

In the logic-inspired paradigm, an instance of a symbol is something for which the only property is that it is either identical or non-identical to other symbol instances. It has no internal structure that is relevant to its use; and to reason with symbols, they must be bound to the variables in judiciously chosen rules of inference. By contrast, neural networks just use big activity vectors, big weight matrices and scalar non-linearities to perform the type of fast ‘intuitive’ inference that underpins effortless commonsense reasoning.

I note, however, commonsense reasoning seems to be problematic for everyone.[4]

Let’s look at one more passage from the interview:

For things like GPT-3, which generates this wonderful text, it’s clear it must understand a lot to generate that text, but it’s not quite clear how much it understands.

I’m not sure that it is at all useful to say that GPT-3 understands anything. I think that, in using that term, Hinton is displaying what I’ve come to think of as the word illusion.[5] Briefly, GPT-3’s language model is constructed over a corpus consisting entirely of word forms, of signifiers without signifieds, to use an old terminology. But Hinton knows that, of course, but, after all, he understands texts on the basis of word forms alone, as do we all, and so, in effect, credits GPT-3 with somehow having induced meaning from a statistical distribution. The text it generates looks pretty good, no? Yes. And that is something we do need to understand, just what is GPT-3 doing and how does it do it? But this is not the place to enter into that.[6]

I think that GPT-3’s remarkable performance based on such ‘shallow’ material should prompt us into reconsidering just what humans are doing when we produce everyday ‘boilerplate’ text. Consider this passage from LeCun, Bengio, and Hinton, where they are referring to the use of an RNN:

This rather naive way of performing machine translation has quickly become competitive with the state-of-the-art, and this raises serious doubts about whether understanding a sentence requires anything like the internal symbolic expressions that are manipulated by using inference rules. It is more compatible with the view that everyday reasoning involves many simultaneous analogies that each contribute plausibility to a conclusion
.

In dealing with these utterly remarkable devices, we would be rein in our narcissistic investment in the routine use of our ‘higher’ cognitive and linguistic capacities as opposed to our mere sensory-motor competence. It’s all neural vectors. 

Note, however, that it is one thing to say that “we do internal operations on big vectors.” I agree with that. That’s not quite the same as saying we can do everything with deep learning. Deep learning is a collection of architectures, but I’m not sure such architectures are adequate for internalizing the vectors needed to effectively mimic human perceptual and cognitive behavior. The necessary conceptual breakthroughs will likely take us considerably beyond deep learning engines. With that qualification, let’s continue.

How the brain might be doing it

I find that, with the caveats I’ve mentioned, this is rather congenial. Which is to say that I can make sense of it in terms of issues I’ve thought through in my own work.

Some years ago David Hays and I wanted to come to terms with neuroscience and ended up reviewing a wide range of work and writing a paper entitled, “Principles and Development of Natural Intelligence.”[7] The principles are ordered such that principle N assumed N-1. We called the fifth and last principle indexing:

The indexing principle is about computational geometry, by which we mean the geometry, that is, the architecture (Pylyshyn, 1980) of computation rather than computing geometrical structures. While the other four principles can be construed as being principles of computation, only the indexing principle deals with computing in the sense it has had since the advent of the stored program digital computer. Indexed computation requires (1) an alphabet of symbols and (2) relations over places, where tokens of the alphabet exist at the various places in the system. The alphabet of symbols encodes the contents of the calculation while the relations over places, i.e. addresses, provide the means of manipulating alphabet tokens in carrying out the computation. [...] Within the context of natural intelligence, indexing is embodied in language. Linguists talk of duality of patterning (Hockett, 1960), the fact that language patterns both sounds and sense. The system which patterns sound is used to index the system which patterns sense.

In short, “indexing gives computational geometry, and language enables the system to operate on its own geometry.” This is where we get symbols and complex reasoning.

I should note that, while we talked of “an alphabet of symbols” and “relations over places” we were not asserting that that’s what was going on in the brain. That’s what’s actually going on in computers, but it applies only figuratively to the brain. The system that is using sound patterns to index patterns of sense is using one set of neural vectors (though we didn’t use that term) to index a different set of neural vectors.

How do we get deep learning to figure that out? I note that automatic image annotation is a step in that direction [8], but have nothing to say about that here.

Instead I want to mention some informal work I did some years ago on something I call attractor nets.[9] The general idea was to use Sydney Lamb’s relational networks, in which nodes are logical operators, as a tertium quid between the symbol-based semantic networks Hays and I had worked on in the 1970s and the attractor landscapes of Walter Freeman’s neurodynamics. I showed – informally, using diagrams – how using logical operators (AND, OR) over attractor basins in different neurofunctional areas could reconstruct symbolic systems represented as directed graphs. Each node in a symbolic graph corresponds to a basin of attraction, that is, an attractor. In the present context we can think of each neurofunctional area as corresponding to a collection of neural vectors and the attractors as objects represented by those vectors. An attractor net would then become a way of thinking about how complex reasoning could be accomplished with neural vectors.

In the attractor net notation word forms, or signifiers, are distinct from word meanings, of signifieds. Is that distinction important for complex reasoning? I believe it is, though I’m not interested in constructing an argument at this point. That, I believe, puts a limit on what one can expect of engines like GPT-3. That too requires an argument.

So, what about natural vs. artificial intelligence?

The notion of intelligence is somewhat problematic. As a practical matter I believe that a formulation by Robin Hanson is adequate: “’Intelligence’ just means an ability to do mental/calculation tasks, averaged over many tasks.”[10] As for the difference between artificial and natural, that comes down to four things:

1) a living system vs. an inanimate system,
2) a carbon-based organic electro-chemical substrate vs. a silicon-based electronic substrate,
3) real neurons (having on average 10K connections with others) vs. considerably simpler artificial neurons realized in program code, and
4) the neurofunctional architecture and innate capacities of a real brain vs. the system architecture of a digital computing system.

Make no mistake, those differences are considerable. But I think we now have in hand a body of concepts and models that is rich enough to support ever more sophisticated interaction between students of neuroscience and students of artificial intelligence. To the extent that our research and teaching institutions can support that interaction I expect to see progress accelerate in the future. I offer no predictions about what will come of this interaction.

Some related posts

William Benzon, Showdown at the AI Corral, or: What kinds of mental structures are constructible by current ML/neural-net methods? [& Miriam Yevick 1975], New Savanna, June 3, 2020, https://new-savanna.blogspot.com/2020/06/showdown-at-ai-corral-or-what-kinds-of.html.

William Benzon, What’s AI? – Part 2, on the contrasting natures of symbolic and statistical semantics [can GPT-3 do this?], New Savanna, July 17, 2020, https://new-savanna.blogspot.com/2019/11/whats-ai-part-2-on-contrasting-natures.html.

William Benzon, A quick note on the ‘neural code’ [AI meets neuroscience], New Savanna, April 20, 2021, https://new-savanna.blogspot.com/2021/04/a-quick-note-on-neural-code-ai-meets.html.

References

[1] Interview with Karen Hao, AI pioneer Geoff Hinton: “Deep learning is going to be able to do everything”, MIT Technology Review, Nov. 3, 2020. https://www.technologyreview.com/2020/11/03/1011616/ai-godfather-geoffrey-hinton-deep-learning-will-do-everything/

[2] Sydney Lamb, Linguistic structure: Linguistic Structure: A Plausible Theory, Language Under Discussion, 4(1) 2016, 1–37, https://doi.org/10.31885/lud.4.1.229.

[3] From Yann LeCun, Yoshua Bengio, and Geoffrey Hinton, Deep Learning, Nature, 521 28 May 2015, 436-444, https://doi.org/10.1038/nature14539.

[4] As an example I offer a recent post in which I quiz GPT-3 about a Jerry Seinfeld bit: Analyze This! Screaming on the flat part of the roller coaster ride [Does GPT-3 get the joke?], May 7, 2021, https://new-savanna.blogspot.com/2021/05/analyze-this-screaming-on-flat-part-of.html.

[5] See my post, The Word Illusion, May 12, 2021, https://new-savanna.blogspot.com/2021/05/the-word-illusion.html.

[6] For some extended remarks, see my working paper, GPT-3: Waterloo or Rubicon? Here be Dragons, Working Paper, Version 2, August 20, 2020, 34 pp., https://www.academia.edu/43787279/GPT_3_Waterloo_or_Rubicon_Here_be_Dragons_Version_2.

[7] William Benzon and David Hays, Principles and Development of Natural Intelligence, Journal of Social and Biological Structures, Vol. 11, No. 8, July 1988, 293-322, https://www.academia.edu/235116/Principles_and_Development_of_Natural_Intelligence.

[8] Wikipedia, Automatic image annotation, https://en.wikipedia.org/wiki/Automatic_image_annotation.

[9] William Benzon, Attractor Nets, Series I: Notes Toward a New Theory of Mind, Logic, and Dynamics in Relational Networks, Working Paper, 52 pp., https://www.academia.edu/9012847/Attractor_Nets_Series_I_Notes_Toward_a_New_Theory_of_Mind_Logic_and_Dynamics_in_Relational_Networks.

William Benzon, Attractor Nets 2011: Diagrams for a New Theory of Mind, Working Paper, 55 pp., https://www.academia.edu/9012810/Attractor_Nets_2011_Diagrams_for_a_New_Theory_of_Mind.

William Benzon, From Associative Nets to the Fluid Mind, Working Paper. October 2013, 16 pp. https://www.academia.edu/9508938/From_Associative_Nets_to_the_Fluid_Mind.

[10] Robin Hanson, I Still Don’t Get Foom, Overcoming Bias, July 24, 2014, https://www.overcomingbias.com/2014/07/30855.html.

No comments:

Post a Comment