NEW SAVANNA: No one had predicted GPT-3. How do you update your priors? [Why learning from history is hard]

Thursday, July 7, 2022

No one had predicted GPT-3. How do you update your priors? [Why learning from history is hard]

More from the Pinker/Aaronson debate on AI scale.

Aaronson at comment #240:

It’s true that I utterly failed to predict the deep learning revolution. I was certainly aware of the thesis, which I associated with Ray Kurzweil, that before long Moore’s Law would cause machines to have as many computing cycles as the human brain, and at that point we should expect human-level AI to “just miraculously emerge.” That struck me as one of the stupidest theses I’d ever heard! Computing cycles aren’t just magical pixie dust, I’d explain: you’d also need a whole research effort, which could take who knows how many centuries or millennia, to figure out what to do with the cycles!

Now it turns out that the thesis was … well, we still don’t know if it’s right all the way to AGI, and certainly great new ideas (GANs, transformer models, etc.) have also played a role, but it’s now clear that the “computing cycles as magic pixie-dust” thesis contained more rightness than almost anyone imagined back in 2000.

So, this is my excuse: I’m not contradicting myself (which is bad), I’m updating based on new evidence (which is good).

But my real excuse is that hardly any of the experts predicted this either. And I just had dinner with Eliezer a couple weeks ago, and he told me that he didn’t predict it. He was worried about AGI in general, of course, but not about the pure scaling of ML. The spectacular success of the latter has now, famously, caused him to say that we’re doomed; the timelines are even shorter than he’d thought.

While it caused Eliezer to update from “we should all worry about this” to “screw it, we’re doomed,” it caused me and quite a few others to update from “we shouldn’t all worry about this” to “we should all worry about this.”

Me at comment #280 after quoting from Scott’s #240:

It caused me to update from “the space of possible minds is huge” to “the space of possible minds is even larger than I thought it was.” My update is different from yours, but doesn’t necessarily contradict it. More like orthogonal to it.

This language of “updating” comes from Bayesian statistics, which I do not know on a technical level. But then, in this kind of context, it is not used technically. This usage is ubiquitous in the so-called rationalist community.

Roughly speaking, you have some idea of what’s going on in some domain, in this case, artificial intelligence. That idea is your prior and implicitly entails predictions about how that domain will unfold over time. If things unfold in a way that is consistent with your prior views, then things won’t surprise you. When something surprising happens, though, that’s a signal that your priors are wrong. You must now adjust your priors. That’s what Aaronson is talking about in the last three paragraphs I quoted from him and what I’m talking about in my paragraph. Now, while Baysianism tells you to update your priors, it doesn’t tell you just how to update your priors.

I continue my comment with my now standard analogy for dealing with large complex problems, Christmas tree lights:

I’m a bit more interested in understanding the brain than I am in scaling Mount AGI. Here’s how I’ve been thinking about understanding the brain. Imagine that understanding means takes the form of a string of serial-wired Christmas tree lights, 10,000 of them. To consider the problem solved all the lights have to be good so that the string lights up.

Instead of understanding the brain, apply the analogy to understanding how to create AGI (whatever that is). Let’s start at 1956, the year of the Dartmouth conference. It’s at that point we were handed the string and were told, “get this to light up and you’ve solved AI.” Since digital computing had been a going concern for over a decade at that point and work had already been done on chess and on natural language, some of the bulbs in that 10,000 bulb string were good. But we did’t know how many or where they were. Between 1956 and whenever OpenAI started working on GPT-3 we’d replaced, say, 2037 bad bulbs with good ones. Let’s say that in creating GPT-3 OpenAI replaced 100 bulbs, which we know about. So 2137 bulbs have been replaced. How many more bulbs to go before all of them are good?

Obviously we don’t know. Some people seem to think it’s only a couple of hundred or so, most of them having to do with scaling up even further. What, beyond wishful thinking, justifies both the belief that the unknowns cluster in one area and that their number is so low? Maybe we still have over 5000 or 6000 to go, maybe more. Who knows?

That is to say, it’s one thing to adjust your priors by thinking we’re now at long last on the right track and quite different to think, as I did, the world just got much larger. Those different updates reflect, in effect, two different sets of priors.

I go on to say something about where my priors come from:

By way of calibration, I should note that back in the era of symbolic computing I had once felt – though never published – that we were within 20 years of being able to build a system that read Shakespeare plays in an “interesting” way. By “interesting” I meant that we could have the system read, say, The Winter’s Tale, and then we’d open it up, trace what it did, and thereby learn what happens we humans read that play. That is to say I believed we could construct a system that could reasonably be construed as a simulation of the human mind. Alas, the AI Winter of the mid-1980s killed that dream. While these new post GPT-3 systems are wonderful, I see little prospect that any of them can be considered a simulation of the human mind nor that any of them will be able to shed insight into Shakespeare in the near or mid-term future. Beyond, say, 2140 (the year of Kim Stanley Robinson’s New York 2140) I’m not prepared to say.

My sense of such matters is that reading about such collapses of intellectual projects in a history book is not the same as living through one. The valence is much weaker. So I’m sticking with my new prior, “the space of possible minds is even larger than I thought it was.”

And THAT difference, I believe, is crucial. Living through a set of events, AI Winter, affects your priors in a way that is quite different from only knowing about those events from a historical account. In both cases we’re dealing with an ongoing stream of events, the evolution of AI research from the 1950s into the present and on to the future. I suspect the difference can ultimately be traced to the brain. What someone reads about events simply does not affect “deep circuitry” the same way as experiencing those same events, even if one really really believes what was read.

More generally, this is one reason that learning from history is so very difficult. What you’ve lived through is much more potent, has a greater effect on your updating mechanism, that what you’ve only read about or heard from third parties. Everyone’s experience is necessarily limited. If there is any wisdom that does indeed accrue to age, this would be one source of it. But there is obviousy a limit to how long one person can live.

I’ve discussed this before, in particular, in a post from 2021, Things change, but sometimes they don’t: On the difference between learning about and living through [revising your priors and the way of the world].