NEW SAVANNA: AGI as shibboleth, symbols [reacting to Jack Clark]

Sunday, August 7, 2022

AGI as shibboleth, symbols [reacting to Jack Clark]

Jack Clark has a LONG tweet stream on AI policy. Though I don’t agree with every tweet – would anyone? – it’s worth at least a quick look. I want to comment on two of the tweets.

AGI as shibboleth, and beyond

Discussions about AGI tend to be pointless as no one has a precise definition of AGI, and most people have radically different definitions. In many ways, AGI feels more like a shibboleth used to understand if someone is in- or out-group wrt some issues.
— Jack Clark (@jackclarkSF) August 6, 2022

Has AGI ever been anything other than a shibboleth? I believe the term was coined in the 1990s because some researchers felt that AI had become stale and focused on specialized domains, so-called “narrow” AI. The phrase “artificial general intelligence” (AGI) was as a banner under which to revive the founding goal of AI, to construct the artificial equivalent of human intelligence.

What researchers actually construct are mechanisms. But no one knows how to specify a mechanism or set of mechanisms for AGI. Oh, sure, there’s the Universal Turing machine which can, in point of abstract theory, compute any computable function. It may be a mechanism, but the idea so abstract that it provides little to no guidance in the construction of computer systems.

AGI, like AI before it, is an abstract goal, a beacon, without a procedure that will lead to it. No matter how vigorously you chase over the surface of the earth for the North Star, you’re never going to get there. And so AGI simply functions as a shibboleth. If you want into the club, you have to pledge allegiance to AGI.

But you don’t need to pledge allegiance in order to construct interesting and even useful systems. So why invent this unreachable goal? Is it just to define a club?

Meanwhile I’ve written a paper in which I define the idea of an artificial mind. I begin by defining mind:

A MIND is a relational network of logic gates over the attractor landscape of a partitioned neural network. A partitioned network is one loosely divided into regions where the interaction within a region is (much) stronger than the interactions between regions. Each of these regions will have many basins of attraction. The relational network specifies relations between basins in different regions.

Note that the definition takes the form of specifying a mechanism involving logic gates and a neural network. Given that:

A NATURAL MIND is one where the substrate is the nervous system of a living animal.

And:

An ARTIFICIAL MIND is one where the substrate is inanimate matter engineered by humans to be a mind.

There are other definitions as well as some caveats and qualifications.

However, those definitions come after 50 pages of text and diagrams in which I lay out the mechanisms that support those definitions. The paper is primarily about the human brain, but one can imagine constructing artificial devices that meet those specifications. Now, whether those specifications are the right specifications, that’s open for discussion. However that discussion turns out, it is a discussion about mechanisms, not myths and magic.

The paper:

Relational Nets Over Attractors, A Primer: Part 1, Design for a Mind, Version 2, Working Paper, July 13, 2022, pp. 76, https://www.academia.edu/81911617/Relational_Nets_Over_Attractors_A_Primer_Part_1_Design_for_a_Mind

Ah, symbols

Here’s a twofer:

Richard Sutton's The Bitter Lesson is one of the best articulations of why huge chunks of research are destined to be irrelevant as a consequence of scale. This makes people super mad, but also seems like a real phenomenon.
— Jack Clark (@jackclarkSF) August 6, 2022

It's the first tweet that interests me, but let’s look Richard Sutton’s bitter lesson. Here’s his opening paragraph:

The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. The ultimate reason for this is Moore's law, or rather its generalization of continued exponentially falling cost per unit of computation. Most AI research has been conducted as if the computation available to the agent were constant (in which case leveraging human knowledge would be one of the only ways to improve performance) but, over a slightly longer time than a typical research project, massively more computation inevitably becomes available. Seeking an improvement that makes a difference in the shorter term, researchers seek to leverage their human knowledge of the domain, but the only thing that matters in the long run is the leveraging of computation. These two need not run counter to each other, but in practice they tend to. Time spent on one is time not spent on the other. There are psychological commitments to investment in one approach or the other. And the human-knowledge approach tends to complicate methods in ways that make them less suited to taking advantage of general methods leveraging computation. There were many examples of AI researchers' belated learning of this bitter lesson, and it is instructive to review some of the most prominent.

Sutton then goes on to list domains where there has proven so: chess, Go, speech recognition, and computer vision. He then draws some conclusions, which I want to bracket.

Note, however, that Sutton talks of researchers seeking “to leverage their human knowledge of the domain.” Is that what’s going on symbolic AI? Perhaps in expert systems, which may have been the most pervasive practical result of GOFAI. But I don’t think that’s an accurate general characterization. That’s not what was going on in computational linguistics, for example, or in much of the work on knowledge representation. That research was based on the belief that much of human knowledge is inherently symbolic in character and therefore that we must create models that capture that symbolic character.

Why did those models collapse? I think there are several factors involved:

1. Combinatorial explosion: Symbolic systems tend to generate large numbers of alternative with little or no way of choosing among them.

2. Hand coding: Symbolic systems have to be painstakingly hand-coded, which takes time.

3. Too many models, difficult to choose among them: This exacerbates the hand-coding problem.

4. Common sense has proven elusive: But then it has proven elusive for deep learning as well.

Perhaps the first problem can be solved through more computing power, though exponential search can easily outstrip the addition of CPU cycles and memory. The third problem is one for science, and is, I believe, entangled with the fourth one. The second problem is inconvenient, but, alas, if hand-coding is necessary, then it’s necessary. But perhaps if we’re clever....

On the fourth one, here’s what I said in my GPT-3 paper:

A lot of common-sense reasoning takes place “close” to the physical world. I have come to believe, but will not here argue, that much of our basic (‘common sense’) knowledge of the physical world is grounded in analogue and quasi-analogue representations. This gives us the power to generate language about such matters on the fly. Old school symbolic machines did not have this capacity nor do current statistical models, such as GPT-3.

Thus the problem is not specific to symbolic systems. It is quite general. It’s not at all clear that we can deal with this problem without having robots out and about in the world. I note that the working paper I mentioned in the previous section, Relational Nets Over Attractors, is about constructing symbolic structures over quasi-analog representations, which, following the terminology of Saty Chary, I characterize as structured physical systems.

Let’s return to Sutton’s paper. Here’s his final paragraph:

The second general point to be learned from the bitter lesson is that the actual contents of minds are tremendously, irredeemably complex; we should stop trying to find simple ways to think about the contents of minds, such as simple ways to think about space, objects, multiple agents, or symmetries. All these are part of the arbitrary, intrinsically-complex, outside world. They are not what should be built in, as their complexity is endless; instead we should build in only the meta-methods that can find and capture this arbitrary complexity. Essential to these methods is that they can find good approximations, but the search for them should be by our methods, not by us. We want AI agents that can discover like we can, not which contain what we have discovered. Building in our discoveries only makes it harder to see how the discovering process can be done.

I’m hesitant to think of symbol systems as being “simple ways to think about the contents of minds.” That strikes me as rhetorical overkill. But Sutton is right about “the arbitrary, intrinsically-complex, outside world.” He says that “we should build in only the meta-methods that can find and capture this arbitrary complexity.” Well, sure, why not?

But are we doing that now? That’s not at all obvious to me. it seems likely to me that the DL community is hoping that they’ve discovered the metamethods, or will do so in the near future, and so we don’t have to think about what’s going on inside either human minds or the machines we’re building. Well, if human minds use symbols, and it seems all but self-evident that we do – if language isn’t a symbol system, what is? – then the current repertoire of DL methods is not up to the task.

What meta-methods are needed to detect patterns of symbolic meaning and construct those quasi-analog representations?

My GPT-3 paper:

GPT-3: Waterloo or Rubicon? Here be Dragons, Version 4.1, Working Paper, May 7, 2022, 38 pp., https://www.academia.edu/43787279/GPT_3_Waterloo_or_Rubicon_Here_be_Dragons_Version_4_1

NEW SAVANNA

Pages in this blog

Sunday, August 7, 2022

AGI as shibboleth, symbols [reacting to Jack Clark]

No comments:

Post a Comment