The Aaronson/Pinker debate on AI scaling generated a lot of commentary, including some from me and some from NYU’s Ernie Davis, who works closely with Gary Marcus. I’ve gathered some of those together in this post. But first....
FYI: "If we do try to define “intelligence” in terms of mechanism rather than magic, it seems to me it would be something like “the ability to use information to attain a goal in an environment.”" this is how John McCarthy defined intelligence decades ago (almost verbatim)
— Madame Pratolungo joining Mastodon (@MadamePratolung) July 1, 2022
What’s interesting is that that definition defines intelligence as a relation between some device (natural or artificial) and the environment in which it operates. That relationship has been dogging AI for some time.
Moravec’s paradox
Here is my first contribution to the debate (comment #81):
There is a song lyric, "Fools rush in, where angels fear to tread." Call me a fool.
Scott #33:
...stepping back: my exchanges with you, Steve, and others have been useful for me, in clarifying how “the power or powerlessness of pure intellectual ability to shape the world” is really at the heart of the entire AGI debate.
Well, yes, though the first time I read that I gave it a very reductive reading where "pure intellectual ability" was something like "computational horsepower". However, the relationship between computational horsepower and pure intellectual ability (whatever that might be) is at best unspecified. However, computational horsepower is certainly at the center of current debats about scaling. And it's quite clear that the abundance of relatively cheap compute has been extraordinarily important.
Take chess, which has been at the center of AI since before the 1956 Dartmouth conference. Chess is a rather special kind of problem. From an abstract point of view it is no more difficult that tic-tac-toe. Both are finite games played on a very simple physical platform. However, the chess-tree is so very much larger than the tic-tac-toe tree that playing the game is challenging for even the most practiced adults, while tic-tac-toe challenges no one over the age of, what? seven?
However, the fact that the chess tree is generated from a relatively simple basic structure (on 64 squares, 32 pieces, highly restrictive rules) means that compute can be thrown at the problem in a relatively straight-forward way. And the availability of compute has been important in the conquest of chess. It's certainly not the only thing, but without it, we'd be stuck where we were well before Big Blue beat Kasparov.
In contrast, things like image recognition, machine translation, or common sense knowledge, those are quite different in character from chess. The number of possible images is unbounded and they're in all forms. Language, the number of word types may be finite, but it's not well-defined, and the number of different texts is unbounded. Common sense, the same. Throwing more and more compute at the problem helps, but computational approaches to those problems, and others like them, has not produced computers that perform at the Kasparov level, and better, in those respective domains.
This has been known for a long time, it has a name, Moravec’s paradox. I think we should keep it in mind during these discussions.
Note that Moravec’s paradox is about the nature of the environment in which computation is tasked with achieving goals. Some environments are more amenable to computational regimes we understand than others.
Ernie Davis on computational speed and intelligence
Here is his comment #18, in full:
Let me suggest the following thought experiment. Suppose we take some mediocre, stick-in-the-mud scientist from 1910 who rejected not just special relativity but also atomic theory, the kinetic theory of heat, and Darwinian evolution — there were, of course, quite a few such. Now speed him up by a factor of 1000. One’s intuition is that result would be thousands of mediocre papers, and no great breakthroughs. On the other hand, it doesn’t seem right to say that Einstein, Planck and so on were 1000 times more intelligent than him; in terms of measures like IQ, they may not have been at all less intelligent than him. So I am really doubtful that this speeding up process has much to do with genius in the sense of Einstein et al. And therefore I think your intuition about speeding up Einstein by a factor of 1000 is also wrong. Had we speeded up Einstein by a factor of 1000 during his lifetime starting in 1905, we might have gotten the great papers of 1905 within a day (as fast as he could physically write them) and general relativity within a week, (ignoring the fact that that involved interactions with non-speeded up people) but I don’t think you can be confident about how much more we would have gotten.
And some passages from his comment #24:
On the last point: I think that the terminology does matter, because the view that “intelligence” is a well-defined, scalar, characteristic of minds, shown in its highest degree by people of exceptional intellectual accomplishment, is an error, and not an innocuous one. There is really very little reason to think that the qualities of mind that made Jane Austen exceptional had anything at all in common with the quality of mind that made Ramanujan exceptional; or the qualities of mind that made Chopin, Emily Dickinson, William James, or Rachel Carson exceptional. [...]
Of course, if you take all of human history and, so to speak, videotape it and then run the video tape at 1000 x speed, then things happen 1000 times as fast. So what?
Indeed, so what?
What if searching for ideas is like searching for diamonds?
This is an idea I explored more extensively in a post from 2020, Stagnation, Redux: It’s the way of the world [good ideas are not evenly distributed, no more so than diamonds]. I subsequently incorporated that post into a working paper, What economic growth and statistical semantics tell us about the structure of the world.
Comment #108:
I would like to elaborate on the comment Ernie Davis made at #18, because I suspect he’s correct. I suspect that 1000X Einstein would have given us his great work rather quickly but that [he] would [then] have proceeded out into the same intellectual desert the real Einstein explored, but managed to explore it much more thoroughly, with, alas, the same success.
Just how are ideas distributed in idea space? (Is that even a coherent question?)
Let me suggest an analogy, diamonds. We know that they are not evenly distributed on or near the earth’s surface. Most of them seem to be in kimberlite (a type of rock) and that’s where diamond mines are located. Even there, they are few, far between, and irregularly located. So it takes a great deal of labor to find each diamond.
Now, imagine we have a robot that can find diamonds at 1000 times the rate human miners can, but only costs, say, 10 times or even 100 more times per hour. Such robots would be very valuable. Now, let’s place a bunch of these 1000X robots on some arbitrary chunk of land and let them dig and sort away. What are they going to find? Probably nothing. Why, because there are no diamonds there. They may be very good at excavating, moving, crushing, and sorting through earth, but if there are no diamonds there, the effort is wasted.
Perhaps ideas and idea space are like that. The ideas are unevenly distributed. We have no maps to guide us to them. But we have theories, and hunches, an intellectual style. Think of them collectively as a mapping procedure. So, Einstein had his intellectual style, his mapping procedure. That led to roughly a decade of important discoveries in his 20s and 30s, like diamond miners working in kimberlite. And then, nothing, like diamond miners working, say, in the middle of Vermont. Nice country, but no diamonds.
As for idea space, we can imagine it by analogy with chess space. But we know how to construct chess space, though it is too large for anything approaching a complete construction. And that knowledge allows us to construct useful procedures for searching it. We haven’t a clue about how to construct idea space, much less how to search it effectively. If speed is all we’ve got, it’s not clear how much that gets us in the general case.
It’s not at all obvious that we need the notion of idea space in the case of Einstein, and similar cases. Einstein’s just searching the world for a fit between his best thinking and natural phenomena. Chess space, of course, is different. It is entirely artificial; we created it when we created the game. The world Einstein explored pre-existed him (and us).
Five factors of genius/intelligence
Further response to Davis, # 18:
Suppose we take some mediocre, stick-in-the-mud scientist from 1910 who rejected not just special relativity but also atomic theory, the kinetic theory of heat, and Darwinian evolution — there were, of course, quite a few such. Now speed him up by a factor of 1000. One’s intuition is that result would be thousands of mediocre papers, and no great breakthroughs. On the other hand, it doesn’t seem right to say that Einstein, Planck and so on were 1000 times more intelligent than him; in terms of measures like IQ, they may not have been at all less intelligent than him.
Speed is one thing. And IQ is another. Einstein had something else. I suppose we could call it genius, in fact we do, don’t we? But that doesn’t tell us much.
For the sake of argument – I’m just making this up as I type – let’s say one aspect of that something else is intellectual technique. Einstein had more effective intellectual tactics and strategies than those standard investigators. Intellectual technique may, in turn, have a genetic aspect that’s not covered by IQ, but almost certainly has a learned aspect as well.
So now we have four things: 1) speed/compute, 2) IQ, 3) an inherited component of technique, and 4) a learned component of technique.
I’m going to posit one more thing, again, thinking off the top of my head. We might call it luck. Or, if we’re thinking in terms of something like idea space, we could call it initial position. By virtue of 1, 2, 3 and perhaps 4 as well, the so-called genius is at a position in idea space that allows them to make major discoveries by deploying their cumulative capabilities. The point of this initial-position factor is to allow for the possibility of a cohort of thinkers more or less equally endowed with 1,2,3+4, but having very different initial position. As a consequence, some are able to achieve major discoveries quickly, while others take more time, and still others never get there. Their capabilities are comparable, but their outcomes are not.
To invoke the diamond mining metaphor I introduced in comment #108, we have two equally skilled geologists/prospectors. One just happens to be located within 100 miles of a major kimberlite deposit while the other is over 3000 miles away from such a deposit. If they start walking from where they are, who’s going to find diamonds first?
In the case of AI, we know a great deal about compute/speed; we have that under control. I’m not sure just how the distinction between innate vs. learned techniques applies to machines, perhaps hardware and software. In any case, we do have a large repertoire of techniques of various kinds. In some areas we can produce a combination of compute and technique that allows the machine to outperform the best human. In other areas we have machines that do things that are amazing in comparison with what machines did, say, a decade ago, but which are no more than standard human performances, with various failings here and there. And so on. As for starting position, I think it’s up to us to position the AI properly, at least at the start.
[But once and if it FOOMs, it’s on its own. I’m not holding my breath on this one.]
Figure 5 in the 2020 New Savanna post gives a visual illustration of the initial position idea.
We now have a total of five factors:
1) compute,
2) IQ,
3) an inherited component of technique,
4) a learned component of technique, and
5) initial position or luck.
The seductiveness of scale
The hope of the scaling side of the current debate is that we can get all the way to AGI – whatever that is – by throwing more compute at the problem. Well, it’s not that simple, the compute has to be channeled through an appropriate machine-learning architecture which then chews its way to a huge pile of (appropriately curated) data. That is, architecture+data will cover the ground I’ve indicated in factors 2-5 above, thereby relieving us of the need and responsibility to think about those things.
It’s a seductive prospect. Why? In part because it is easy to understand, even by people who have little or no technical knowledge of computing, cognitive science, and AI. Everyone knows and understands, “bigger is better.”
GOFAI (good old fashioned artificial intelligence) was mostly about technique, factors 3 and 4. That technique was generally taken to be mediated by symbolic systems and, as a practical matter, it required that ‘knowledge’ be painstakingly hand-crafted into systems. While I can understand the desire to avoid hand-crafted knowledge – there’s so very much of it and the crafting is tedious and error-prone — I don’t think symbolic computation can be avoided. Can it be architected, as it were, into a learning regime? We know one case where it has been, the human case, but that case tells us that learning requires a lot of close interaction between teachers and students, in both formal and informal settings. It’s not at all clear to me that such interaction can be architected.
More later.
Addendum, 7.1.22, on superintelligence: Alex, comment #171:
I think Pinker’s definition of intelligence, “the ability to use information to attain a goal in an environment”, is reasonable, but it doesn’t give us any meaningful way to compare or rank intelligences (so how can we meaningfully discuss “superintelligence”?). Of course, you chose compute time as the metric, but I think that dodges the more meaningful aspects of intelligence. I think a metric like computational complexity – or even Kolmogorov complexity – is more appealing to me, but whatever the metric, I think it has to capture the mechanism of thought in some way, not just the output. [...]
As a final note, I think “intelligence” is a crude word that tries to capture too many aspects of behavior (many of them human-relatable, but not of great importance to discussion). My comment here has been an attempt to break up “intelligence” into constituent parts to focus discussion: clock speed, memory, algorithmic/time complexity, size/space complexity. There are surely more parts of “intelligence”, some parts that are combinations of simpler parts.
Scott, comment #172:
Fundamentally, I care, not about the definitions of words like “superintelligence,” but about what will actually happen in the real world once AIs become much more powerful. [...] So OK then, what happens when we can launch a billion processes in datacenters, each one with the individual insight of a Terry Tao or Edward Witten (or the literary talent of Philip Roth, or the musical talent of the Beatles…), and they can all communicate with one another, and they can work at superhuman speed? Is it not obvious that all important intellectual and artistic production shifts entirely to AIs, with humans continuing to engage in it (if they do) only as a hobby? That’s the main question I care about when I discuss “superintelligence,” and I’m still waiting for anyone to explain why I’m wrong about it.
No comments:
Post a Comment