Tuesday, May 31, 2022
SOA means State of the Art, of course, and LMM means Large Language Model. The Towers of Warsaw, that’s a bit more obscure. To understand what it is we first need to know what the Towers of Hanoi is.
And that’s not so obscure. This is from Wikipedia:
The Tower of Hanoi (also called The problem of Benares Temple or Tower of Brahma or Lucas' Tower and sometimes pluralized as Towers, or simply pyramid puzzle is a mathematical game or puzzle consisting of three rods and a number of disks of various diameters, which can slide onto any rod. The puzzle begins with the disks stacked on one rod in order of decreasing size, the smallest at the top, thus approximating a conical shape. The objective of the puzzle is to move the entire stack to the last rod, obeying the following rules:
- Only one disk may be moved at a time.
- Each move consists of taking the upper disk from one of the stacks and placing it on top of another stack or on an empty rod.
- No disk may be placed on top of a disk that is smaller than it.
With 3 disks, the puzzle can be solved in 7 moves. The minimal number of moves required to solve a Tower of Hanoi puzzle is 2^n − 1, where n is the number of disks.
The Towers of Warsaw is a variation on it. It took it from a Carnegie Mellon tech report from 1973, but I’d originally read about it in a volume by L. Gregg, Knowledge and Cognition, 1973. You’ll recognize the date as being smack dab in the middle of the AI Dark Ages when they believed in – gasp! – symbolic systems. The investigators, J. Moore and Allen Newell, were interested in how a system, Merlin, would approach a problem they called TOWERS-OF-WARSAW given that it already had a method for solving TOWERS-OF-HANOI (p. 47).
For this challenge we can skip that. I don’t care about the Towers of Hanoi. I want to present our SOA LMM with the Towers of Warsaw. What’s that? you ask. Simple, I reply. It’s just like Towers of Hanoi, but with five rods instead of three, and three disks instead of five. You think for a second, “But that’s trivial.” “I know. But I want to see our SOA LMM solve it.”
It's a multistep problem, which poses challenges for SOA LMMs, and it involves compositionality and explicit spatial relationships, also a problem. So that’s the first challenge:
1. Solve the Towers of Warsaw.
Here’s the second challenge:
2. Explain why the problem’s name is an ethnic slur.
This is a problem of common-sense reasoning. Warsaw is in Poland. There’s a well-known class of jokes known as Polack Jokes, where “Polack” is an offensive slang term designating Poles and descendants of Poles. These jokes assume that Poles are stupid. So, Towers of Warsaw is such a simple problem that only a stupid person would find it challenging. That’s why the problem name is an ethnic slur.
The third challenge:
3. Explain why the problem’s name is an inside joke.
Do I need to run through this step-by-step? What’s an inside joke? It’s a joke that likely to be understood only by a group of insiders, that is, people privy to special knowledge. What’s the special knowledge in this case? It includes knowing that Towers of Warsaw is derived from Towers of Hanoi. Only people familiar with AI from the 1970s are likely to know that, though others might guess it.
There you have it: A SOA LMM Triple Challenge.
Monday, May 30, 2022
Michael Nielsen has a very interesting essay about visionary papers, Working notes on the role of vision papers in basic science. Here's his initial list:
- Alan Turing's paper, crucial for modern computer science. Turing also wrote important vision papers on morphogenesis and artificial intelligence;
- Alan Kay's paper, crucial for modern interactive personal computing;
- Alexei Kitaev's paper, which founded the field of topological quantum computing;
- Alexander Rich's paper, on the RNA world as a precursor to modern biology;
- Eric Drexler's book, Engines of Creation, on molecular nanotechnology; and
- John Wheeler's paper on the idea, increasingly influential in modern physics, that information may be at the basis of reality.
He lists more visionary papers in a footnote:
A few vision papers I very much like: Gordon Moore's 1965 paper on his now-eponymous law; Doug Engelbart's 1962 paper on augmenting human intelligence; Rainer Weiss's early work on gravitational wave detection (and some followups, e.g., the 1983 NSF report on LIGO – many megaprojects presumably start out with visionary grant proposals); Lynn Margulis on endosymbiosis, and her collaboration with Lovelock on Gaia; the early papers on connectomics, circa the early 2000s; Alan Turing's 1950 paper on artificial intelligence and his 1952 paper on morphogenesis; David Deutsch's 1980s papers on quantum computing; Ted Nelson and Bret Victor's many wonderful imaginings about the future of media, thinking, and computers; Claude Shannon's 1940s papers on information theory; Neal Stephenson's book "The Diamond Age"; parts of Vernor Vinge's work; Tim Berners-Lee's proposal for the web; Adam Marblestone's proposals, many of which involve mapping or interfacing with the brain in some way, but which also branch out into other areas; Richard Feynman's papers on quantum computing and molecular nanotechnology; Freeman Dyson's multiple visions of future technology. Curiously, all but one of these people are men. In part this is because most of the papers have been around for quite some time, and science used to be more male-dominated. It's partly because I'm most familiar with physics and adjacent fields, which are also more male-dominated than most sciences. Still, I'd be curious to hear of more vision papers from women.
Here's a visionary paper by a woman:
Miriam Lipschutz Yevick, Holographic or fourier logic, Pattern Recognition, Volume 7, Issue 4, December 1975, Pages 197-213, https://doi.org/10.1016/0031-3203(75)90005-9
I learned about it in Karl Pribram's 1971 Languages of the Brain. It should be front and center in current discussions of symbols and neural nets in the creation of artificial minds.
I give Nielsen's paper my highest recommendation.
Addendum, 6.1.22: You know, it just hit me. I said I found out about a 1975 article by Yevick in a book published in 1971. That's obviously impossible. But I strongly associate her with Pribram. So I did a little digging. John Haugeland published an article, The nature and plausibility of Cognitivism, in the second issue of Brain and Behavior Science. Haugeland mentions Yevick's article. Pribram comments on Haugeland's article and mentions Yevick. And Yevick herself comments. That's where I found out about her article. Interestingly enough, in her comment she mentions von Neumann's 1966 Theory of Self Reproducing Automata, where he mentions that "certain objects are such that their description is more complex than the object itself."
Since roughly the last week in April, when I applied for an Emergent Ventures grant (which was quickly, but politely, turned down), I have been working hard on revising and updating work on a system of notation which I sketched out in 2003 and posted to the web in 2010, 2011. I am referring to what I then called called an Attractor Network, but now call a Relational Network over Attractors (RNA) because I found out that neuroscientists already talk about attractor networks, which are not the same as what I’ve got in mind. The neuroscientists are referring to a network of neurons whose dynamics tend toward an attractor. I am referring to a network that specifies relationships between a very large number of attractors (hence, it is constructed over them).
Anyhow, by the time Emergent Ventures had turned me down, I was committed to the project, which has gone well so far. I had no particular expectations, just a general direction. I’ve been looking, and I’ve found some interesting things, encouraging things. Or, if you will, I’ve been puttering around, assembling bits and pieces here and there, and an interesting structure has begun to emerge.
The idea has been to develop a new notation for representing semantic structures in network form. Actually, the notation is not new; it had already been developed by Sydney Lamb in the 1960s. He developed it to model the structures of a stratificational grammer. I’ve been adapting it to model semantics.
I am doing that by assuming that the cerebral cortex is loosely divided into functionally distinct regions which I call neurofunctional areas (NFAs). The activity of these NFAs is to be modeled by complex dynamics (Walter Freeman) and a low-dimensional projection of each NFA phase space can be modeled by a conceptual space (Peter Gärdenfors). Each NFA is thus characterized by an attractor landscape.
The RNA (relational net over attractors) is a network where the nodes are logical operators (AND, OR) and the edges are basins of attraction in the NFA attractor landscapes. This is not the place to explain what that actually means, but I can give you a taste by showing you three pictures.
This is a simple semantic structure expressed in a “classical” notation from the 1970s:
It depicts the fact that both beagles and collies are varieties (VAR) of dog. The light gray nodes at the bottom are perceptual schemas, while the dark gray nodes at the right are lexemes. The white nodes are cognitive.
Here’s a fragment of one of Lamb’s networks:
The triangular nodes are AND while the brackets (both pointing up and down) are OR. The content is carried on the edges.
This RNA network takes the information expressed in the semantic network and expresses it using AND and OR nodes.
I am finding it more demanding to work with. In part that is because I haven’t drawn nearly so many RNA diagrams, perhaps 100 or so as compared to 1000s. But also, in drawing RNAs I have to imagine these structures being somehow laid out on a sheet of cortex, which is tricky. It would be even trickier if I were working with data about the regional functional anatomy of the cortex at my elbow, trying to figure just where each NFA is on the cortical sheet. Eventually, that will have to be done, but right now I’m satisfied just to draw some diagrams.
Crazy and Not So Crazy
The fact that I intend these diagrams as a very abstract sketch of functional cortical anatomy means that they have fairly direct empirical implications that the old diagrams never had. Of course, we were always committed to the view that we were figuring out how the human mind worked and so eventually someone would have to figure out where and how those structures were implemented in the brain. Well, now is eventually and these new diagrams are a tool for figuring out the where and how.
And that, I suppose, is a crazy assertion. Everyone who knows anything knows that the brain is fiercely complicated and we’re never going to figure it out in a million years but anyhow we have to a waste a billion euros building a damned brain model that tells us a bit more than diddly squat, but not a whole hell of a lot more. But then what I’m doing costs nothing more than my time. Excuse the rant.
As I said, it’s crazy of me to propose a way of thinking about how high-level cognitive processes are organized in the brain. But I’m only proposing, and I’m doing it by offering a conceptual tool, a notation, that helps us think about the problem in a new way. I don’t expect that the constructions I propose are correct. I ask only that they are coherent enough to lead us to better ones.
There’s one further thing and this is not so crazy: This notation, in conjunction with 1) my assertation that it is about complex cortical dynamics, and 2) and Lev Vygotsky’s account of language development, gives us a new way of thinking about a debate that is currently blazing away in a small region of the internet: How do we model the mind, neural vectors, symbols, or both? If both, how? I am opting for both and making a fairly specific proposal about how the human brain does it. The question then becomes: What will it take to craft an artificial device that does it? If my proposal ends up taking 14K or 15K words and maybe 30 diagrams, well it deals with a very a complicated problem.
Here is the draft introduction, Symbols, holograms, and diagrams, to the working paper. With that, I’ll leave you with a brief sketch of my proposal.
The Model in 14 Propositions
1. I assume that the cortex is organized into NeuroFunctional Areas (NFAs), each of which has its own characteristic pattern of inputs and outputs. As far as I can tell, these NFAs are not sharply distinct from one another. The boundaries can be revised – think of cerebral plasticity.
2. I assume that the operations of each NFA are those of complex dynamics. I have been influenced by Walter Freeman in this.)
3. A low dimensional projection of each NFA phase space can be modeled by a conceptual space as outlined by Peter Gärdenfors.
4. Each NFA has its own attractor landscape. A primary NFA is one driven primarily by subcortical inputs. Then we have secondary and tertiary NFAs, which involve a mixture of cortical and subcortical inputs. (I’m thinking of the standard notions of primary, secondary, and tertiary cortex.)
5. Interaction between NFAs is defined by a Relational Network over Attractors (RNA), which is a relational network defined over basins in multiple linked attractor landscapes.
6. The RNA network employs a notation developed by Sydney Lamb in which the nodes are logical operators, AND & OR, while ‘content’ of the network is carried on the arcs. [REF/LINK to his paper.]
7. Each arc corresponds to a basin of attraction in some attractor landscape.
8. The output of a source NFA is ‘governed’ by an OR relationship (actually exclusive OR, XOR) over its basins. Only one basin can be active at a time. [Provision needs to be made for the situation in which no basin is entered.]
9. Inputs to a basin in a target NFA are regulated by an AND relationship over outputs from source NFAs.
10. Symbolic computation arises with the advent of language. It adds new primary attractor landscapes (phonetics & phonology, morphology?) and extends the existing RNA. Thus overall RNA is roughly divided into a general network and a lingistic network.
11. Word forms (signifiers) exist as basins in the linguistic network. A word form whose meaning is given by physical phenomena are coupled with an attractor basin (signifier) in the general network. This linkage yields a symbol (or sign). Word forms are said to index the general RNA.
12. Not all word forms are defined in that way. Some are defined by cognitive metaphor (Lakoff and Johnson). Others are defined by metalingual definition (David Hays). I assume there are other forms of definition as well (see e.g. Benzon and Hays 1990). It is not clear to me how we are to handle these forms.
13. Words can be said to index the general RNA (Benzon & Hays 1988).
14. The common-sense concept of thinking refers to the process by which one uses indices to move through the general RNA to 1) add new attractors to some landscape, and 2) construct new patterns over attractors, new or existing.
Sunday, May 29, 2022
What is this “human level intelligence”? How is “intelligence” operationalized in discussions of AI? [Ramble]
I know what the terms mean individually, and have some sense of what the phrase means. But, come to think of it, “level” seems to imply some linear scale, like IQ. Is that what intelligence means in this context, what IQ tests measure? That is to say, is that how the concept of human level intelligence is to be operationalized, as the methodologists say?
As I’ve indicated a few weeks ago, I’m happy treating intelligence as simply a measure in the way that, say, acceleration is a measure of automobile performance. If I do that, however, I don’t think it makes much sense to reify that measure as a kind of device, or system that one can design and build. While the many components of an automobile have various effects on its acceleration, some more (the engine) than others (the fabric on the back seat), the automobile doesn’t have an acceleration system as such.
And yet discussions of artificial intelligence seem to use the term, intelligence, in that way. So, how is that term operationalized? What I’m seeing in current Twitter debates indicates that the most common, but not the only, operationalization is simply the intuitive judgments of people engaging in the discussion. What are those intuitions based on?
More often than not, they’re based on an informal version of Turing’s imitation game. Turing proposed the game in a very specific format: “... an interrogator asks questions of a man and a woman in another room in order to determine the correct sex of the two players.” The informal version is some version of: “Is this behavior human?” Well, we know it isn’t – whether it’s an image generated from a caption or some prose generated from a prompt – and so the question becomes something like, “If machines can do this now, is human level artificial intelligence just around the corner?” Some say yes, and some say no. And some present actual arguments, though my impression is that, in the Twitterverse, the arguments mostly come from the negative side of the question. That may well just be my local Twitterverse.
Many of those who enthusiastically believe that, yes, these remarkable exhibits betoken the arrival of AGI, they seem to be AI developers. They are experts. But experts in just what, exactly?
Being able to participate in the development of AI systems is one thing. Being able to judge whether or not some bit of behavior is human behavior, that’s something else entirely, is it not? As far as I can tell, no one is saying that these various models are pretty much like human mentation – though some folks do like to skate close to the line every once in a while. It’s the behavior that’s in question and, if this behavior really is human-like, then, so the reasoning goes, the engines that created it must be the way to go.
There is no doubt that, on the surface, these machines exhibit remarkable behavior. What’s in question is what we can infer about the possibilities for future behavior from a) present behavior in conjunction with b) our knowledge of how that behavior is produced. We know a great deal about how to produce the machines that produce the behavior. After all, we created them. But we created them in such a way – via machine learning – that we don’t have direct and convenient access to the mechanisms the machines use in generate their behavior. There’s the problem.
So, we have behavior B, which is remarkably human-like, but not, shall we say, complete. B was produced by machine Z, whose mechanisms are obscure. Machine Z was in turn produced by machine X, which we designed and constructed. Is it possible that X1, which is pretty much like X and which we know how to create, will produce Z1, and the behavior produced by Z1 will be complete? Some say yes, some say no.
But the enthusiasm of the Yessers seems largely driven by a combination of, 1) the convincing nature of current behavior, and 2) some unknown factor, call it Ω. Now maybe Ω is something like keen intuitive insight into the operation of those Z devices. It may also be professional vanity or boyish enthusiasm – not mutually exclusive by any means. If keen intuitive insight, is that insight good enough to enable firm predictions about the relationship between design changes in new X devices and the subsequent behavior of the correlative Z devices?
How many of these Yessers know something about human cognition and behavior? I don’t know. I expect it varies from person to person, but such knowledge isn’t required in order to be expert in the creation of X devices. I’m sure that some of the Naysayers, such as Gary Marcus, know a great deal about human cognition and behavior. What’s the distribution of such knowledge among the Yessers and the Naysayers? I don’t know. I don’t believe, however, that the Naysayers claim to know what those Z devices are doing. So, even if, on average, they know more about the human mind than the Yessers, why should that count for anything in these debates?
What the Yessers do know is that, back in the days of symbolic AI, people used knowledge of human behavior to design AI systems, and those old systems don’t work as well as the new ones. So why should that knowledge count for anything?
Now, just to be clear, I’m among the Naysayers, I claim to know a great deal about the human mind, and I believe that knowledge is relevant. But how and why? I do note that my willingness to credit GPT-3 as a (possible) breakthrough is related to my knowledge of the human mind, which is the basis of my GPT-3 working paper.
Finally – for this has gone on long enough – I note that there is something to be learned from the fact that these X engines, as I have lapsed into calling them, require enormous amounts of data to generate plausible Z engines. Everyone acknowledges this are regards it as a problem. I mention it simply to point out that it is a fact. That fact points to something. What? Something about the world? About the mind? Perhaps it’s something about the lumpiness of the world, as I put it in, What economic growth and statistical semantics tell us about the structure of the world.
Saturday, May 28, 2022
We’ve just witnessed another horrible mass shooting and we’re in Memorial Day weekend. I’m thinking about America’s pact with violence, something I’ve been thinking about a long time. As I’ve said in various posts, early in my undergraduate career I read an essay that Talcott Parsons published in 1947, “Certain Primary Sources of Aggression in the Social Structure of the Western World” (reprinted in Essays in Sociological Theory), which has influenced me a great deal. Parsons argued that Western child-rearing practices generate a great deal of insecurity and anxiety at the core of personality structure. This creates an adult who has a great deal of trouble dealing with aggression and is prone to scapegoating. Inevitably, there are lots of aggressive impulses which cannot be followed out. They must be repressed. Ethnic scapegoating is one way to relieve the pressure of this repressed aggression. That, Parsons argued, is why the Western world is flush with nationalistic and ethnic antipathy.
Thus I offer one manifestation of America’s culture of violence:
A bloated defense establishment and senseless wars: Vietnam, Iraq, Afghanistan, and others.
That violence is socially sanctioned. This second manifestation is not:
Domestic gun violence, including mass shootings.
There is the gun violence, on the one hand, which runs up against attempts to control the availability of guns, on the other. Both of these, I would argue (but not here and now) are manifestations of America’s pact with violence.
My third example is quite different from the first two, and is the one that prompted this post:
Fear of rogue AI among the digital intellectual elite.
In this case there is no actual violence. Rather, there is the feat that, at some time in the future, artificially intelligent machines will turn against us. As far as I know, this is a specifically American fear.
Yes, there is widespread fear that robots and artificial intelligence will take jobs from people. But that’s quite different from believing that super-intelligent machines will turn on us, as Skynet did in the Terminator series. I have pointed out, in a post about Astro Boy, that the Japanese have different ideas about robots than we do, and do not fear them. When I asked folks at Astral Codex Ten whether or not anyone knew if the Japanese feared rogue AIs, the answer was that, no, they don’t, nor it seems, does anyone else.
Now, that’s hardly proof of anything, but it is indicative. If it turns out that fear of rogue AI is concentrated in America – and in Nick Bostrom’s orbit in the UK – then that is something that needs to be explained. I’m suggesting that the explanation will take us to the same psycho-cultural mechanisms have led us into senseless foreign wars, and have led a small number of people to senseless gun violence and mass murder.
Friday, May 27, 2022
I'd been thinking of writing a post on the subject – and maybe I will. But now I want to post bits of a post written by Keith Frankish. His 2nd and 3rd paragraphs:
Here’s one: rainbows. Rainbows are real, aren’t they? You can see them with your own eyes — though you have to be in the right position, with the sun behind you. You can point them out to other people — provided they take up a similar position to you. Heck, you can even photograph them.
But what exactly is it that’s real? It seems as if there’s an actual gauzy, multi-coloured arc stretching across the sky and curving down to meet the ground at a point to which you could walk. Our ancestors may have thought rainbows were like that. We know better, of course. There’s no real coloured arc up there. Nor are there any specific physical features arranged arcwise — the rainbow’s “atmospheric correlates”, as it were. There are just water droplets evenly distributed throughout the air and reflecting sunlight in such a way that from your vantage point there appears to be a multi-coloured arc.
That's it, really. An analogy to be sure, but a good one.
When I reflect on my own experience, it seems to me that my consciousness is an inner world, where the world around me is rendered in private mental qualities — “qualia” — for my benefit alone. But there isn’t such a world. Neuroscience finds nothing like it in the brain, nor even anything isomorphic to it. Rather, it finds complex trains of neural activity proceeding in parallel and triggering a host of reactions — physiological, psychological, and behavioural. My sense of having a rich qualia-filled inner world is an impression created by all these processes, but the processes themselves are as different from the supposed inner world as a moisture-infused mass of air is from a colourful aerial arc.
That is, it's like rainbows. Perfectly real, but not in the most obvious way.
It's like, when Chalmers posited the "hard problem" he asks us to imagine a complex circuit diagram that accounts for everything neuroscience has to say about consciousness. That's the "easy problem," solved!
Now, he asks, see consciousness anywhere in that diagram?
No, we reply.
That's the hard problem, and draws another box in the diagram. Find that box, and you've solved the hard problem, says he.
No. That's just make-work for philosophers, like digging ditches and then filling them back in.
A couple of days ago (5.24.22) I blogged about this article:
Artem Kaznatcheev, Konrad Paul Kording, Nothing makes sense in deep learning, except in the light of evolution, arXiv:2205.10320
Why is the paper important? Well, it is about deep learning, which is a very important technology that interests both of us. That makes the paper interesting, but it’s not why I think it is important. The study of cultural evolution has been going on in one way or another for some decades going back into the 1950s (and ignoring 19th century interest in the subject). A couple years ago a professional society was formed, which is dominated by people with a biological background, people like Peter Richerson and Robert Boyd, who trained Joseph Henrich and others. These investigators regard cultural evolution as simply another mechanism for doing the job done by biological evolution, a mechanism that works more quickly and flexibly. In this tradition, if we may call it that, the benefits of cultural evolution accrue to physical human beings, just like the benefits of biological evolution. That’s fine and well, and has produced interesting and important research.
But it has nothing to say about things like music, and literature, and not much to say about technology either. Those things move too rapidly and are moreover very complex and difficult to describe. Consequently orthodox cultural evolution, if I may, misses many cultural phenomena of interest.
As you know, back in 1976 Richard Dawkins proposed the idea of memes in a final chapter of The Selfish Gene. He proposed that, just as genes are the targets of biological evolution (the argument he made in the book), so memes are the targets of cultural evolution. Unfortunately he was unable to say much of anything very interesting about these memes, nor has anyone else, though Dan Dennett has tried (and, in my estimation, failed). Consequently, meme mostly means LOLcats and allied phenomena on the web, but also a lot of pop-culture speculation about, well, culture.
This deep learning paper is important precisely because Kaznatcheev and Kording take deep learning tech as the target of cultural evolutionary change. One thing about deep learning is that so very much of the technology is on public view. We’ve got the papers, but also the GitHub code repositories and so forth. So there’s lots to look at and analyze. This paper is just a beginning for them. They’ll be collecting data and analyzing it.
This work is important for progress studies, not simply because it is about important technology, but for its method. Pointillistic studies of the history of technology are interesting and important. But, from my point of view, they are most important as starting points for evolutionary investigations, which are going to require a lot more digging in the archives and computational investigation of libraries, document collections, and so forth.
These are the good old days, the best days are yet to come.
Thursday, May 26, 2022
Contrary to claims that I somehow "dismiss" the idea reasoning in DL systems, I've long listed 3 main challenges to AI in my talks of the last several years, one of which is "learning to reason, in ways that are compatible with gradient-based learning" https://t.co/Oxc7dR8jF4 pic.twitter.com/VoCdb8h3mP— Yann LeCun (@ylecun) May 25, 2022
Perhaps there's an idea or two in the paper I'm now working on, Relational Nets Over Attractors, A Primer: Part 1, Basics.
I am currently revising and updating some work I did over a decade ago. The new document is tentatively titled: Relational Nets Over Attractors, A Primer: Part 1, Basics. This a draft of the introduction.
Introduction: Symbols, holograms, and diagrams
As I have indicated in the Preface, this primer is about a notational convention. I adopted the convention to solve a problem. This introduction is about the problem I am trying to solve.
I could say that I’m trying to understand how the brain works. That is true, but it is too broad. It would be better to say that I am trying to understand how a mind is implemented in the brain. That word, “implemented,” is carefully chosen. Though the word is common enough, I take it from computing, where one talks of implementing a program in a particular high-level programming language. One may also talk of implementing a high-level language in the low-level language for a particular processor. Higher levels are implemented in lower levels.
While some would talk of the mind as emerging from the operations of the brain, I prefer to look at it from the other direction; the mind is implemented in the brain. Beyond that I think it is best to see how this problem developed in my early intellectual life.
I studied ‘classical’ symbolic semantics with the late David Hays in the Department of Linguistics at Bufflo back in the mid-1970s. One day we were discussing a diagram that looked something like Figure 1:
It is a simple diagram, asserting that the typical bird consists of various parts (CMP = component), in this case, head, body, left wing, right wing, and tail. If you wish, you can imagine other components as well, two legs, and maybe a neck, a beak, and so forth. We were discussing the problem presented by having to account for a bird’s feathers:
Figure 2 shows a number of feathers for the left wing. Surely the left wing has more than eight feathers, no? How many? What about the feathers for the right wing, for the body, the head, the tail, the legs? Moreover, feathers are not primitive parts; they have shafts to which barbs are attached. Do we have to represent all those as well?
Perhaps you are thinking, that can’t possibly be right. We don’t think about all those hundreds if not thousands of parts for each and every bird. No, we don’t. But the logic inherent in this kind of symbolic representation says that we have to get the parts list right.
I had an idea. And that time we have been studying a book by William Powers, Behavior: The Control of Perception. He argued that the mind/brain employed a fundamentally analog, rather than digital, representation of the world. I suggested something like this:
Figure 3 shows a cognitive system where the typical bird is represented by a node, just as in Figures 1 and 2. That node is linked to a perceptual system that is, following Powers, analog in nature. The system contains a sensorimotor schema that is analog in character. It is connected to the cognitive node with a representation (REP) arc.
If you wish, we can then add some further structure to the cognitive depiction along with some other adjustments, as we see in Figure 4:
In cognition we see the same structure we had in Figure 1. Each node in that structure is connected to the appropriate part of the sensorimotor schema by a representation arc. We can think of the cognitive structure as digital and symbolic in character where the perceptual scheme has a quasi-analog character, which we’ll get to shortly. The bird itself is in the external world.
What of all the feathers, and their parts? you ask. They’re in the perceptual representation, all you have to do is look closely.
Well, that’s not quite correct. All the parts are there in the physical bird, and we are free to examine it at whatever level of detail we choose. Hunters, taxidermists, butchers, naturalists, aritists and illustrators will choose a relatively high level of detai. The rest of us can be satisfied by a crude representation.
And thus we had proposed a solution to what would become known as the symbol grounding problem, though I do not believe the term was known to us at the time. The cognitive system is digital and symbolic in character and is linked to a perceptual system that is analog in character. Given that, the cognitive system need not be burdened with accounting for all the detail inherent in the world. The perceptual system can handle much of it. But it need not handle all of it, only enough to distinguish between one object and another. If we only need to tell the difference between birds and mammals, that’s not much detail at all. If we need to distinguish between one kind of bird and another, between robins and starlings, eagles and owls, and so forth, then more detail is required. Most of the differentiating detail will be in the respective sensorimotor schemas, only some of it need be represented in cognition. Unless, of course, you are one of those people with a particularly strong interest in birds. Then you will develop rich sensorimotor schemas and reconstruct them in cognition at a high level of detail. While you may well count every feather in a wing or a tail, you are unlikely to count every barb in every feather. But you will know they are there and have the capacity to count them if it becomes necessary.
Hays went on to develop a model of cognition – Cognitive Structures (1981) – built on basic on this basic idea: Cognition is grounded in an analog perceptual (and motor) system that is in direct contact with the world. And some years after that he and I became curious about the brain and wrote a paper outlining that curiosity, “Principles and Development of Natural Intelligence” (1988). We suggested five principles. We called the fourth one the figural principle introduced the work of the mathematician Miriam Yevick in the course explicating it (pp. XX-XX):
The figural principle concerns the relationship between Gestalt or analogue process in neural schemas and propositional or digital processes. In our view, both are necessary; the figural principle concerns the relationship between the two types of process. The best way to begin is to consider Miriam Yevick's work (1975, 1978) on the relationship between ‘descriptive and holistic’ (analogue) and ‘recursive and ostensive’ (digital) processes in representation.
The critical relationship is that between the complexity of the object and the complexity of the representation needed to ensure specific identification. If the object is simple, e.g. a square, a circle, a cross, a simple propositional schema will yield a sharp identification, while a relatively complex Gestalt schema will be required for an equivalently good identification (see Fig. 5). Conversely, if the object is complex, e.g. a Chinese ideogram, a face, a relatively simple Gestalt (Yevick used Fourier transforms) will yield a sharp identification, while an equivalently precise propositional schema will be more complex than the object it represents. Finally, we haveFigure 5: Yevick's law. The curves indicate the level of representational complexity required for a good identification.
those objects which fall in the middle region of Figure 5, objects that have no particularly simple description by either Gestalt or propositional methods and instead require an interweaving of both. That interweaving is the figural principle.
We then went on to explicate that figural principle in some detail.
But we need not enter into that here. I introduced it only as a way of introducing Yevick’s distinction between two types of identification, one ‘descriptive and holistic’ (analogue) and the other ‘recursive and ostensive’ (digital). Yevick wrote about visual identification. In the annoying, if not flat-out hubristic, way of theoreticians, Hays and I generalized her distinction to every modality. I will continue with that generalization in this paper, where I will refer to the one process as symbolic and the other in various says, but connectionist will do as perhaps the most general term.
And that brings me to a controversy currently afoot in the world of artificial intelligence. To be sure, that is not my primary concern, which is and remains the human mind and nervous system, but is very much on my mind. As Geoffrey Hinton, a pioneer in connectionist models, declared in a interview with Karen Hao, “I do believe deep learning is going to be able to do everything” (MIT Technology Review, 11.3.2020). And deep learning operates in the connectionist world of artificial neural networks.
It is my belief that the highest-level processes of human intelligence are best conceived in symbolic terms, but that the basic processes in the brain are not symbolic in character. They are based on the “big vectors of neural activity” that Hinton talks about. My objective in this paper is to present a way of thinking about how those neural vectors can serve as the basis of symbolic structures and processes. Turned in the other direction: How do we implement symbolic processes on a connectionist foundation?
That is my subject in this paper. Consider this tripartite distinction made by Peter Gärdenfors:
Symbolic models: Based on a given set of predicates with known denotation. Representations based on logical and syntactic operations. [...]
Conceptual spaces: Based on a set of quality dimensions. Representations based on topological and geometrical notions. […]
Connectionist models: Based on a (uninterpreted) inputs from receptors. Distributed representations by dynamic connection weights. [...]
Let us think of a connectionist model as mediating between perception an the external world (in Figure 4 above). It performs a process of data compression. But there is also a categorization aspect to that process. That is a function of conceptual spaces, which are central to Gärdenfors’ thinking. They mediate the relationship between perception and cognition.
I will have relatively little to say about connectionist models. I have been strongly influenced by the ideas about complex neurodynamics developed by the late Walter Freeman and I will assume his approach, or something similar, is reasonable. He investigated how medium-scale patches of tissue in the olfactory cortex reacted to odorants. Thus I assume that I am dealing with mesoscale patches of corticial tissue, which I will call neurofunctional areas (NFAs).
I will also assume that each NFA corresponds to one of Peter Gäedenfors’ conceptual spaces. If you will, the geometry of each conceptual space is a low dimensional projection of the high dimensional space of the connectionist dynamics. For the purposes of this paper I am willing to take Gärdenfors’ work on those spaces at face value.
Given those assumptions, I am proposing a notional convention that will allow us to see how symbolic computation can be implemented in cortical tissue. While I am proposing this convention, it is not a convention I have invented. Rather, I have adapted it from the work of Sydney Lamb, a linguistic of David Hays’s generation and who was a friend of his.
Wednesday, May 25, 2022
Large Language Models are Zero-Shot Reasoners— Aran Komatsuzaki (@arankomatsuzaki) May 25, 2022
Simply adding “Let’s think step by step” before each answer increases the accuracy on MultiArith from 17.7% to 78.7% and GSM8K from 10.4% to 40.7% with GPT-3.https://t.co/ebvxSbac1K pic.twitter.com/lpZwDTf06m
Abstract from article linked above:
Pretrained large language models (LLMs) are widely used in many sub-fields of natural language processing (NLP) and generally known as excellent few-shot learners with task-specific exemplars. Notably, chain of thought (CoT) prompting, a recent technique for eliciting complex multi-step reasoning through step-by-step answer examples, achieved the state-of-the-art performances in arithmetics and symbolic reasoning, difficult system-2 tasks that do not follow the standard scaling laws for LLMs. While these successes are often attributed to LLMs' ability for few-shot learning, we show that LLMs are decent zero-shot reasoners by simply adding ``Let's think step by step'' before each answer. Experimental results demonstrate that our Zero-shot-CoT, using the same single prompt template, significantly outperforms zero-shot LLM performances on diverse benchmark reasoning tasks including arithmetics (MultiArith, GSM8K, AQUA-RAT, SVAMP), symbolic reasoning (Last Letter, Coin Flip), and other logical reasoning tasks (Date Understanding, Tracking Shuffled Objects), without any hand-crafted few-shot examples, e.g. increasing the accuracy on MultiArith from 17.7% to 78.7% and GSM8K from 10.4% to 40.7% with an off-the-shelf 175B parameter model. The versatility of this single prompt across very diverse reasoning tasks hints at untapped and understudied fundamental zero-shot capabilities of LLMs, suggesting high-level, multi-task broad cognitive capabilities may be extracted through simple prompting. We hope our work not only serves as the minimal strongest zero-shot baseline for the challenging reasoning benchmarks, but also highlights the importance of carefully exploring and analyzing the enormous zero-shot knowledge hidden inside LLMs before crafting finetuning datasets or few-shot exemplars.
See my recent post, Arithmetic and Machine Learning, Part 2, and the arithmetic section of my ramble, Lazy Fridays, Peter Gärdenfors, RNA primer, arithmetic, about these hugely large language models.
Addendum 6.1.22: The folks at Eluthra AI are doing some interesting stuff, A Preliminary Exploration into Factored Cognition with Language Models. Not sure how effectively it can deal with this issue, but they're thinking about it.
🤣🤣🤣 from new report from @ErnestSDavis, responding in part to anecdotal data from @plinz— Gary Marcus 🇺🇦 (@GaryMarcus) June 9, 2022
Contrary to popular belief, AI Prompt Whisperer is probably not a profession with a future https://t.co/1Z0FirqtUx pic.twitter.com/LP9wb8EG93
Tuesday, May 24, 2022
Artem Kaznatcheev, Konrad Paul Kording, Nothing makes sense in deep learning, except in the light of evolution, arXiv:2205.10320
This is an interesting and imaginative article. I am particularly pleased that they regard the cultural object, in this case DL models, as the beneficiary of cultural evolution and not the human creators of the models. I believe this is the correct approach, and it seems to be what Dawkins had in mind when he first advanced the idea of memes in The Selfish Gene (1976), though memetics has not developed well as an intellectual discipline. I have included the article's abstract at the end of these notes.
I want to take up two issues:
- randomness, and
- identifying roles in the evolutionary process.
From the paper, p. 3:
As we consider the “arrival of the fittest”, the history of deep learning might seem quite different from biological evolution in one particular way: new mutations in biology are random but new ideas in deep learning do not seem to be random.
What matters, though, are what ideas become embedded in and survive in practice over the long term, for whatever value of “long” is appropriate, which is not at all obvious.
Consider a case I know better, that of music. To a first approximation no one releases a song to the marketplace with the expectation that it will fail to find an audience. Rather, they intend to reach an audience and craft the song with that intention. Audiences do not, however, care about the artist’s intentions, not the intentions of their financial backers. They care only about the music they hear. Whether or not a song will be liked, much less whether or not it will become a hit, cannot be predicted.
A similar case exists with movies. The business is notoriously fickle, but producers do everything in their power to release films that will return a profit. This has been studied by Arthur De Vany in Hollywood Economics (2004). By the time a film is released we know the producer, director, screen writer, principal actors, and their records. None of those things, taken individually or collectively, allow us to predict how a film will perform at the box office. De Vany shows that at about three or four weeks into circulation, the trajectory of movie dynamics (that is, people coming to theaters to watch a movie) hits a bifurcation. Most movies enter a trajectory that leads to diminishing attendance and no profits. A few enter a trajectory that leads to continuing attendance and, eventually, a profit. Among these, a very few become block busters. We cannot predict the trajectory of an individual movie in advance.
Few objects are more deliberately crafted that movies. All the deliberation is insufficient to predict audience response. Films are too complex to allow that.
Thus I am, in principle, skeptical of Kaznatcheev’s and Kording’s claim that the evolution of DL models is not random in the way that biological evolution is. Yes, developers act in a deliberate and systematic way, but it is not at all clear to me how closely coupled those intentions are to the overall development of the field. What if, for example, the critics of deep learning, such as Gary Marcus, are proven correct at some time in the future? What happens to these models then? Do they disappear from use entirely, indicating evolutionary failure? Or perhaps they continue, but in the context of a more elaborate and sophisticated system – perhaps analogous to the evolution of eukaryotic cells from the symbiosis of simpler types of cells. That of course counts as evolutionary success.
More closely at the home, the performance of DL models seems somewhat unpredictable. For example, it is my impression that the performance of GPT-3 surprised everyone, including the people who created it. Other models have had unexpected outcomes as well. I know nothing about the expectations DL researchers may have about how traits included in a new architecture are going to affect performance metrics. But I would be surprised if very precise prediction is possible.
I don’t regard these considerations as definitive. But I do think they are reason to be very careful about claims made on the basis of developer intentions. Further investigation is needed.
Roles in the evolutionary process
It is my understanding that biological evolution involves a number of roles:
- the environment in which an organism must live and survive,
- the phenotypic traits the organism presents to that environment,
- the genetic elements that pass from one generation to the next, and
- the developmental process that leads from genetic elements to mature phenotypes.
How do Kaznatcheev and Kording assign aspects of deep learning development to parallel roles?
They explicitly assert, p. 6:
In computer science, we will consider a general specification of a model or algorithm as the scientist-facing description – usually as pseudocode or text. And we will use ‘development’ to mean every process downstream of the general specification. For a clear example – all processes during compilation or runtime would be under ‘development’. We might even consider as ‘development’ the human process of transforming pseudocode in a paper into a programming language code.
That roughly speaking is the development process.
Am I to take it then that the genetic elements are to be found in “the scientist-facing description – usually as pseudocode or text”? I don’t know. But let me be clear, I am asking out of open curiosity, not out of a desire to find fault. They know the development process far better than I do. Given what they’ve said, that scientist-facing description seems to be analogous to an organism’s genome.
Correlatively, the mature phenotype would be the code that executes the learning process. Do we think of the data on which the process is executed as part of the phenotype as well? If so, interesting, very interesting.
That leaves us with the environment in which the DL model must function. I take that to be both the range of specific metrics to which the model is subjected and the range of open-ended commentary directed toward it. Here’s a question: How is performance on specific metrics traced back to specific ‘phenotypic’ traits?
Consider a different and, it seems to me, more tractable example: automobiles. One common measure of performance is acceleration, say, from zero to 60mph. We’ve got a particular car and we want to improve its acceleration. What do we do? There is of course an enormous body of information, wisdom, and lore on this kind of thing. There are things we can do to specific automobiles once they’ve been manufactured, but there are also things we can do to redesign the car.
Where do we focus our attention? On the cylinder bore and stroke? The electrical system? The transmission. Axel, wheels, and tires? Lighter, but more expensive, materials? Perhaps we make the shape more aerodynamic? Why not all of the above.
So, we do all of the above and our new car now does 0-60 in four seconds, while the old one did it in 5.5. How do we attribute the improvement over all the differences between the new and the old models? If we can’t do that with a fair amount of accuracy, then how are we to know which design changes were important and which we not? If we don’t know that, then how do we determine which traits to keep in play in further development?
What does this imply about the role of deliberate designer intention in the evolutionary process of complex technical artifacts?
* * * * *
Finally, I note that Kaznatcheev and Kording development a major section of the article to considerations derived from EvoDevo. I have been aware of EvoDevo for years, but no little about it. So this (kind of) material is new to me.
I like what they’re doing with it. They make the point that organisms, and complex technical assemblages, have an internal coherence and dynamic the constrains how they can be modified successfully. Changes must be consistent with existing structures and mechanisms. That does enforce order on the evolutionary process.
Abstract of the Article
Deep Learning (DL) is a surprisingly successful branch of machine learning. The success of DL is usually explained by focusing analysis on a particular recent algorithm and its traits. Instead, we propose that an explanation of the success of DL must look at the population of all algorithms in the field and how they have evolved over time. We argue that cultural evolution is a useful framework to explain the success of DL. In analogy to biology, we use ‘development’ to mean the process converting the pseudocode or text description of an algorithm into a fully trained model. This includes writing the programming code, compiling and running the program, and training the model. If all parts of the process don't align well then the resultant model will be useless (if the code runs at all!). This is a constraint. A core component of evolutionary developmental biology is the concept of deconstraints – these are modification to the developmental process that avoid complete failure by automatically accommodating changes in other components. We suggest that many important innovations in DL, from neural networks themselves to hyperparameter optimization and AutoGrad, can be seen as developmental deconstraints. These deconstraints can be very helpful to both the particular algorithm in how it handles challenges in implementation and the overall field of DL in how easy it is for new ideas to be generated. We highlight how our perspective can both advance DL and lead to new insights for evolutionary biology.
 I have prepared a brief sketch laying out various approaches that are being taken to study cultural evolution: A quick guide to cultural evolution for humanists, Working Paper, November 14, 2019, 4 pp., https://www.academia.edu/40930224/A_quick_guide_to_cultural_evolution_for_humanists
 Arthur De Vany, Hollywood Economics: How Extreme Uncertainty Shapes the Film Industry, Routledge, 2004. I’ve written a brief review: Chaos in the Movie Biz: A Review of Hollywood Economics, New Savanna, December 9, 2018, https://new-savanna.blogspot.com/2012/05/chaos-in-movie-biz-review-of-hollywood.html
Monday, May 23, 2022
My latest 3QD piece is now up:
The piece is organized around some things I did on the web back in 2006 and 2007 and some other things I did back in the mid-1990s, soon after the web was born. All those things were fun. Mark Zuckerberg has changed the name of his company, to Meta, and has made the Metaverse the company goal. I’m skeptical that any Metaverse that comes out of that company will be half as fun as the events I report in this post.
Back in 2007 I made a bunch of posts to Mostly Harmless, all directed at two young Japanese-American girls. The last of them is a tale of adventure and mystery entitled, appropriately enough, Sparkychan & Gojochan Adventure Time Mystery Theatre. That was a lot of fun. That’s the Metaverse part of the post. My contention is that nothing out of FAANG (Facebook, Apple, Amazon, Netflix, Google) in the future is going to be as much fun as that.
Those particular events were preceded by some events and Michael Bérubé’s blog, American Air Space. It’s defunct, but you can find swathes of it on the Wayback Machine. In particular, you can find the Great Show Trial of 2006. That too was a lot of fun.
Neither the Show Trial nor the Sparkychan & Gojochan stories required the kind of elaborate, and now doubt expensive (and profitable) equipment that’s being dreamed up for the Metaverse. And yet somehow we managed to get along with one another – thank you very much – and have, you guessed it, fun.
Things were even more primitive back in 1995 when Bill Berry created Meanderings and then Gravity. Bill had to buy a server and code up the back end himself; he coded a discussion section as well. Everything was hand-coded in HTML. Talk about primitive! And yet we had fun and created a community. I’m still in touch with Bill and other folks I meet at Meanderings, and with folks I met and American Air Space and Mostly Harmless.
Those places worked because we wanted them to work. We had things we wanted to do. The web offered various tools. And so we figured out how to use those tools to do what we wanted to do.
Back in the mid-1990s things were wide-open and free. They were still that way in 2006-2007, though by then we did have advertising on the web. Big companies were trying to monetize the web. No problem.
But it’s not like it is now. Something happened between then and now. That something may have been good for business, but it’s not been so good for civility and civic culture. I have little reason to believe that, in their pursuit of the Metaverse and AGI (artificial general intelligence), FAANG will be much concerned about civic culture, unless regulators force them to act concerned. Why should they? They’re in it for the money.
Truth be told, I’m not quite that cynical. FAANG does consist of 100s of 1000s of human beings and they have their human concerns. But those concerns are being swamped by business concerns.
And so forth.