One theme that comes up in various discussions of artificial intelligence is that the discipline is primarily an empirical one that lacks theoretical grounding. The default view, and perhaps the dominant one as well, is that what we’re doing is producing results so damn the torpedoes – full speed ahead! But the call to theory keeps nagging, perhaps most recently in a panel discussion entitled Research on Intelligence in the Age of AI, and hosted by MIT’s Center for Minds, Brains, and Machines on its 10th Anniversary.
One theme that has been kicking around for several decades is that there are two styles of computational regime underlying perception, action, and cognition. My purpose is to compare the views that Miriam Lipschutz Yevick articulated about this dichotomy in 1975 and 1978 with those articulated by Yoshua Bengio, Yann LeCun, and Geoffrey Hinton in their 2018 Turing Award lecture, which was published in 2021.
Bengio, LeCun, and Hinton, 2018
Let’s start with Bengio, LeCun, and Hinton, who won the Turing Award in 2018. They published their paper, Deep learning for AI, in 2021. In that paper they asserted:
There are two quite different paradigms for AI. Put simply, the logic-inspired paradigm views sequential reasoning as the essence of intelligence and aims to implement reasoning in computers using hand-designed rules of inference that operate on hand-designed symbolic expressions that formalize knowledge. The brain-inspired paradigm views learning representations from data as the essence of intelligence and aims to implement learning by hand-designing or evolving rules for modifying the connection strengths in simulated networks of artificial neurons.
In the logic-inspired paradigm, a symbol has no meaningful internal structure: Its meaning resides in its relationships to other symbols which can be represented by a set of symbolic expressions or by a relational graph. By contrast, in the brain-in- spired paradigm the external symbols that are used for communication are converted into internal vectors of neural activity and these vectors have a rich similarity structure. Activity vectors can be used to model the structure inherent in a set of symbol strings by learning appropriate activity vectors for each symbol and learning non-linear transformations that allow the activity vectors that correspond to missing elements of a symbol string to be filled in. This was first demonstrated in Rumelhart et al. on toy data and then by Bengio et al. on real sentences. A very impressive recent demonstration is BERT, which also exploits self-attention to dynamically connect groups of units, as described later.
As I said at the beginning, some such characterization of two modes of thinking has been around for some time, though it is expressed in various ways. I have no problem recognizing such a distinction.
Yevick, 1975 and 1978
Miriam Yevick recognized that distinction in her 1975 paper, Holographic or Fourier Logic (Pattern Recognition 7, 187-213). That was at the peak of interest in logic-inspired AI. That was the year Newell and Simon won the Turing Award; their paper, Computer Science as Empirical Inquiry: Symbols and Search, was published the following year.
Yevick was a mathematician, not a cognitive scientist, and had become interested in optical holography though her extensive correspondence with David Bohm, the physicist, during the 1950s. During the 1960s a number of thinkers, including the neuroscientist, Karl Pribram, and the cognitive scientist, Chrisopher Longuet-Higgins, had become interested in holography as a model for neural processing. It’s that interest the Yevick had in mind when she wrote her article. Here is one statement from that article:
It has recently been conjectured that neural holograms enter as units in the thought process. If holographic processes do occur in the brain and are instrumental in thought, then the logical operations implicit in these processes could be considered as intuitive and enter as units in our mental and mathematical computations.
It has also been said that: “if we want the computer to have eyes, we shall first have to give him instruction in the facts of life”.
We maintain in this paper that a language of thought in which holographic operations enter as primitives is essentially different from one in which the same operations are carried out sequentially and hence over a finite time span [...] Our assumption is that “holographic thought” utilizes the associative properties of holograms in “one shot”. Similarly we maintain that apprehension proceeds from the very beginning via two modes, the aural and the optical; whereas the verbal string is natural to the first, the pattern as such is natural to the second: the essentially instantaneous nature of the optical process captures the apprehension as a global unit whose meaning is expressed in the first place in terms of “associations” with other such units.
There we have our distinction, between aural, verbal, and sequential on the one hand and optical, intuitive, and pattern on the other.
Having chosen visual objects as her domain, she argues thus (and here I am quoting from a 1978 restatement):
We can explicate this proposition on a theoretical level in the domain of optical patterns. [...] Such patterns or objects are thin, white regions on a black background. These can be simple (regular), like the outlines of rectangles; or complex, like the outlines of Chinese characters or random-like motions. The following holds true: a complex object requires a long (sequential, quasi-linguistic) description but yields a sharp recognition (auto-correlation) spot under holographic filtering; hence it is identified most readily by holographic recognition, or holistically. A simple object requires a short (quasi-linguistic) description but yields a diffuse recognition spot; hence it is identified most readily by quasi-linguistic representation or description.
Description and holographic recognition thus appear as two (complementary) modes of identifying an object: the more complex the object, the longer its description and the sharper its auto-correlation spot, and vice versa. The more complex they physiognomy of a person, the more unique, and hence sharper, its identity and ease of recall; the more simple, the more common and hence “unidentifiable.” Perfect holographic recognition obtains for a totally “random object”, that is, one with an infinitely long description; for a perfectly sharp point the opposite is true.
Suppose that one is given a store of objects with which one is familiar, a holographic recognition device, and a quasi-linguistic mode of representation; one is then presented with an arbitrary object to be “identified.” An approximate match is obtained either by producing a description of acceptable length or by holographic recognition of a subset of similar (associated) objects from the store. The mode of identification that will be more appropriate then depends on the complexity of the unknown object. If it is simple, we ”know” it by a short linguistic description; if it is complex, by the “associations” it evokes.
What Yevick is explicit about, and what is missing from Bengio, LeCun, and Hinton, is the relationship between some object of perception and cognition and the computational regime operating on that object. She recognizes the utility of both regimes, but associates them with different kinds of objects. As Bengio, LeCun, and Hinton simply do not conceptualize that relationship it is not clear how they would respond to Yevick’s work.
As I recall, that relationship was beginning to be recognized as an issue. Early AI had achieved its successes from dealing with sequential symbolic processing (e.g. theorem proving, expert systems), but faltered when dealing with visual perception and speech recognition. Though I can’t offer a citation, I recall David Marr, who died in 1980, mentioning the problem. The best-known statement of the problem is by Hans Moravec in his 1988 book, Mind Children, where he says “it is comparatively easy to make computers exhibit adult level performance on intelligence tests or playing checkers, and difficult or impossible to give them the skills of a one-year-old when it comes to perception and mobility”(as quoted in Wikipedia). While it is generally recognized within computer science at large, that different kinds of computational system are suited to different kinds of problems, so far as I know the issue has not be systematically investigated in the context of artificial intelligence and machine learning. More specifically, Miriam Yevick’s work from the 1970s has not been taken into account.
What needs to be done
Lots.
Let me repeat: Lots.
For one thing, Yevick’s work has been forgotten. It needs to be revived and vetted in view of more recent work.
Moreover, while her mathematics concentrated on one problem, object identification, in the visual domain, she informally generalized that result to thinking in general. For example (I’m quoting from her 1978 paper):
If we consider that both of these modes of identification enter into our mental processes, we might speculate that there is a constant movement (a shifting across boundaries) from one mode to the other: the compacting into one unit of the description of a scene, event, and so forth that has become familiar to us, and the analysis of such into its parts by description. Mastery, skill and holistic grasp of some aspect of the world are attained when this object becomes identifiable as one whole complex unit; new rational knowledge is derived when the arbitrary complex object apprehended is analytically described.
I’m certainly sympathetic to that generalization. It’s what David Hays and I had in mind when we called on Yevick’s ideas in a 1987 paper on metaphor [1] and a 1988 paper on the brain and human intelligence [2].
For all I know, Bengio, LeCun, and Hinton might be sympathetic as well. Here’s the final paragraph of their Turing Award paper:
How are the directions suggested by these open questions related to the symbolic AI research program from the 20th century? Clearly, this symbolic AI program aimed at achieving system 2 abilities, such as reasoning, being able to factorize knowledge into pieces which can easily recombined in a sequence of computational steps, and being able to manipulate abstract variables, types, and instances. We would like to design neural networks which can do all these things while working with real-valued vectors so as to preserve the strengths of deep learning which include efficient large-scale learning using differentiable computation and gradient-based adaptation, grounding of high-level concepts in low-level perception and action, handling uncertain data, and using distributed representations.
That sounds like a call to reconstruct symbolic capabilities in the context of more realistic models of real neural networks. That also sounds like intellectual work for several generations of researchers. As Charlie Parker was fond of saying, “Now’s the Time.”
References
[1] William Benzon and David Hays, Metaphor, Recognition, and Neural Process, The American Journal of Semiotics, Vol. 5, No. 1 (1987), 59-80. https://www.academia.edu/238608/Metaphor_Recognition_and_Neural_Process.
[2] William Benzon and David Hays, Principles and Development of Natural Intelligence, Journal of Social and Biological Structures, Vol. 11, No. 8, July 1988, 293-322. https://www.academia.edu/235116/Principles_and_Development_of_Natural_Intelligence.
No comments:
Post a Comment