Wednesday, August 11, 2021

Dual-system mentation in humans and machines [updated]

I’m interested in an article recently posted to arXiv:

Maxwell Nye, Michael Henry Tessler, Joshua B. Tenenbaum, Brenden M. Lake, Improving Coherence and Consistency in Neural Sequence Models with Dual-System, Neuro-Symbolic Reasoning, 6 July 2021, arXiv:2107.02794 [cs.AI]

We need to consider it in the context of current debates over whether or not machine learning approaches are full adequate for the creation of ‘intelligent’ systems – whatever they are – of whether we need ‘classical’ symbolic systems as well [1].

But first I want to make some remarks about language and its relation to cognition. Then we can take a look at the dual-system model.

Language and cognition

Language and cognition are often considered in conjunction with one another. Without cognition language is an empty formal system. While cognition can and is studied independently of language – animals, after all, have considerable cognitive capabilities while lacking language, though they have more limited means of communication – the particular range and power of human cognition is often, and I believe properly, attributed to the ways language allows us to “grab hold of” and “extend” our cognitive abilities.

We should note that language is itself a ‘full-depth’ system. Thus it has components in direct interaction with the external world, hearing and speaking, seeing and writing, and more abstract components, syntax and discourse. This system has a rich internal structure independent of its linkage to general cognition.

With that in mind, consider this passage from an old paper [2]:

Nonetheless, the linguist Wallace Chafe has quite a bit to say about what he calls an intonation unit, and that seems germane to any consideration of the poetic line. In Discourse, Consciousness, and Time Chafe asserts that the intonation unit is “a unit of mental and linguistic processing” (Chafe 1994, pp. 55 ff. 290 ff.). He begins developing the notion by discussing breathing and speech (p. 57): “Anyone who listens objectively to speech will quickly notice that it is not produced in a continuous, uninterrupted flow but in spurts. This quality of language is, among other things, a biological necessity.” He goes on to observe that “this physiological requirement operates in happy synchrony with some basic functional segmentations of discourse,” namely “that each intonation unit verbalizes the information active in the speaker’s mind at its onset” (p. 63).

While it is not obvious to me just what Chafe means here, I offer a crude analogy to indicate what I understand to be the case. Speaking is a bit like fishing; you toss the line in expectation of catching a fish. But you do not really know what you will hook. Sometimes you get a fish, but you may also get nothing, or an old rubber boot. In this analogy, syntax is like tossing the line while semantics is reeling in the fish, or the boot. The syntactic toss is made with respect to your current position in the discourse (i.e. the current state of the system). You are seeking a certain kind of meaning in relation to where you are now.

The important point is that, contrary to what you might think, you don’t necessarily know what you’re thinking until you have verbalized your thoughts. Thus conversation is often filled with disfluencies where you start and stop, hunt for the right word or phrase, and then continue. Writing can be like that as well, especially when dealing with new and challenging material.

Let’s amplify that point by considering the general outline of language acquisition offered by the great Russian developmental psychologist, Lev Semyonovich Vygotsky [3]. Vygotsky argued, in effect, that thinking – considered as a quasi-verbal voice in the mind – is just internalized speech [4]. The young child learns to speak to herself as others speak to her and thereby gains a mechanism affording some control over her mind. Think of the process Vygotsky describes in terms of the scaffolding metaphor that’s become popular. The adult’s speech scaffold’s the child's behavior, both in action and perception. An adult can direct the child’s perception (“see the dog”) and action (“come here”) through language. The child gradually learns to use her own speech to perform these functions. Finally there’s no need for external scaffolding, that is, no speech either from an adult or from the child herself. One just ‘thinks.’

Now let us think of Vygotsky’s story in relation to the one I previously told about speaking, where we don’t know what we’re going to say until we’ve actually said it, at which point what we hear is either satisfactory, and we keep going, or not, so we stop and hazard another guess. Vygotsky has the child speaking in the presence an adult who ‘scaffolds’ their activity. They see some object, say a ball. The child looks at it and offers a word. If the word is correct, the adult nods approval. If not, the adult so indicates and the child tries again. In time the adult can be dispensed with. By that time the child has moved beyond the simple act of naming things and is formulating assertions about them. At that point the child is in the situation Chafe described.

Dual-System, Neuro-Symbolic Reasoning

Now let’s consider the dual-system model developed by Maxwell Nye and his colleagues. Here’s the abstract from their article:

Human reasoning can often be understood as an interplay between two systems: the intuitive and associative (“System 1”) and the deliberative and logical (“System 2”). Neural sequence models—which have been increasingly successful at performing complex, structured tasks—exhibit the advantages and failure modes of System 1: they are fast and learn patterns from data, but are often inconsistent and incoherent. In this work, we seek a lightweight, training-free means of improving existing System 1-like sequence models by adding System 2-inspired logical reasoning. We explore several variations on this theme in which candidate generations from a neural sequence model are examined for logical consistency by a symbolic reasoning module, which can either accept or reject the generations. Our approach uses neural inference to mediate between the neural System 1 and the logical System 2. Results in robust story generation and grounded instruction-following show that this approach can increase the coherence and accuracy of neurally-based generations.

The terms “System 1” and “System 2” are from the well-known work of Daniel Kahneman (Thinking, fast and slow, 2013).

Consider the following diagram from the article. Unfortunately it’s illegibly small in this presentation, but you don’t really need to read the text [you can click on it to enlarge it]. You simply need to know what’s happening in the boxes.

The upper pair of boxes represent System 1, which is automatically generated through a machine leaning process. The lower pair of boxes represent System 2, which is an Old School symbolic system created though hand-coding. The process proceeds as follows:

  1. The upper left box represents a story as it exists at a certain point in time.
  2. GPT-3 then parses that story into the hand-coded world model (lower left).
  3. System 1 then generates several candidate continuations of the story (upper right).
  4. GPT-3 parses those candidates into propositions consistent with the world model (lower right).
  5. Those propositions are checked against the current state of the story (lower middle).
  6. A proposition consistent with the model is chosen and the corresponding sentence is attached to the ongoing story (far right).

What is going on here is roughly, and I do mean roughly, what I wrote about in discussing language. Something is proposed, checked, and then redone if necessary. In this case the checking is done by a symbolic model. I suggest that is the case with natural language in human speakers as well.

With that in mind let’s consider these remarks about future possibilities for the dual system model:

A promising direction for future work is to incorporate learning into the System 2 world model. Currently, the minimal world knowledge that exists in System 2 can be easily modified, but changes must be made by hand. Improvements would come from automatically learning and updating this structured knowledge, possibly by incorporating neuro-symbolic learning techniques (Ellis et al., 2020; Mao et al., 2019).

Learning could improve our dual-system approach in other ways, e.g., by training a neural module to mimic the actions of a symbolic System 2. The symbolic System 2 judgments could be used as a source of supervision; candidate utterances rejected by the symbolic System 2 model could be used as examples of contradictory sentences, and accepted utterances could be used as examples of non-contradictory statements. This oversight could help train a neural System 2 contradiction-detection model capable of more subtleties than its symbolic counterpart, especially in domains where labeled examples are otherwise unavailable. This approach may also help us understand aspects of human learning, where certain tasks that require slower, logical reasoning can be habitualized over time and tackled by faster, more intuitive reasoning.

I can’t help but think that the training of “a neural System 2 contradiction-detection model” is rather like the child’s internalization of the scaffolding functions provided by sympathetic adults. The net result would be that a neural system internalizes the structure of a classical symbolic model, thereby coming to implement that model on a neural foundation.

Roughly speaking, think of the child language learner as the neural model, as System 1. The scaffolding adult provides the symbolic logical model, System 2. Over time, the logical structure provided by System 2 becomes incorporated into the ‘texture’ of neural System 1.

* * * * *

In thinking about this I considered attempting to draw the interaction Vygotsky described – I do draw it out in my longer exposition[3] in such a way that it matched the system diagram for the dual-system model. But I decided against it. It would have been tricky at best, and likely something of a force-fit, and I’m not at all sure the effort would have been rewarded by commensurate insight. The rough and ready correspondence that I’ve suggest seems sufficient to my purpose. And my primary purpose has been to assure myself that this model represents a significant step toward the eventual goal of implementing symbolic systems on a neural foundation. [See some tweets I've appended to the end.]

That’s what I had in mind when, over two decades ago, I played around with a notation system I called attractor nets [5]. There I was imagining a logical structure implemented over a large and various attractor landscape. The logical structure is represented by a network formalism developed by Sydney Lamb [4]. The network is quite different from those generally used in classical symbolic systems, where nodes represented objects of various kinds and arcs represented types of relationships among those objects. In Lamb’s nets the nodes are logical operators (OR, AND) while the arcs carry the content of the net. In an attractor net each arc corresponds to a basin of attraction. The net then represents a logical structure over basins of attraction.

It was an interesting conceptual experiment. It needs to be taken farther.

References

[1] I believe that we do need symbolic systems. But I also believe that they can ultimately be implemented in some kind of neural net. See this recent post for further remarks, “Geoffrey Hinton says deep learning will do everything. I’m not sure what he means, but I offer some pointers. Version 2,” New Savanna, May 31, 2021, https://new-savanna.blogspot.com/2021/05/geoffrey-hinton-says-deep-learning-will_31.html.

[2] “Kubla Khan” and the Embodied Mind, PsyArt: A Hyperlink Journal for the Psychological Study of the Arts, Article 030915, November 29, 2003, https://www.academia.edu/8810242/_Kubla_Khan_and_the_Embodied_Mind.

[3] I’ve explained this in a bit more detail in a blog post, “Vygotsky Tutorial (for Connected Courses),” New Savanna, September 10, 2020, https://new-savanna.blogspot.com/2014/10/vygotsky-tutorial-for-connected-courses.html.

That’s excerpted from a long article, First Person: Neuro-Cognitive Notes on the Self in Life and in Fiction, PsyArt: A Hyperlink Journal for Psychological Study of the Arts, August 21, 2000. Downloadable version, https://www.academia.edu/8331456/First_Person_Neuro-Cognitive_Notes_on_the_Self_in_Life_and_in_Fiction.

[4] Sydney Lamb, the computational linguist, believed this as well. He argues it in Pathways of the Brain, Amsterdam: John Benjamins (1998), pp. 181-182.

[5] I never produced an account that I thought was ready for others to read. But, in addition to piles of notes, I did produce two documents intended to summarize the work for my own purposes. The first of these two documents does explain Lamb’s notation and how I adapted it to my purposes, though the exposition is a bit round-about and perhaps attempts too much. The second document is a collection of diagrams, that is, constructions. While the diagrams are heavily notated, they presume that you understand that basic conventions as set forth in the first document. I am currently working on a document that will be a more suitable introduction to the notation.

William Benzon, Attractor Nets, Series I: Notes Toward a New Theory of Mind, Logic, and Dynamics in Relational Networks, Working Paper, 52 pp., https://www.academia.edu/9012847/Attractor_Nets_Series_I_Notes_Toward_a_New_Theory_of_Mind_Logic_and_Dynamics_in_Relational_Networks.

William Benzon, Attractor Nets 2011: Diagrams for a New Theory of Mind, Working Paper, 55 pp., https://www.academia.edu/9012810/Attractor_Nets_2011_Diagrams_for_a_New_Theory_of_Mind

* * * * *

* * * * *

Clever, no doubt, but I don't believe it. We've got two different processes doing on, not one process with a variable timer. Still....

No comments:

Post a Comment