Pages in this blog

Tuesday, April 12, 2022

If a million monkeys are typing out code, how long before they produce a coherent program that does something useful?

Yesterday Eric Jang and Santiago Renteria had an interesting conversation in the Twitterverse about a recent blog post in which Scott Aaronson discussed DeepMind’s AlphaCode paper. Aaronson seemed overly sensitive about recent criticisms of that work. Here's what he said:

Yes, I realize that AlphaCode generates a million candidate programs for each challenge, then discards the vast majority by checking that they don’t work on the example data provided, then still has to use clever tricks to choose from among the thousands of candidates remaining. I realize that it was trained on tens of thousands of contest problems and millions of solutions to those problems. I realize that it “only” solves about a third of the contest problems, making it similar to a mediocre human programmer on these problems. I realize that it works only in the artificial domain of programming contests, where a complete English problem specification and example inputs and outputs are always provided.

Forget all that. Judged against where AI was 20-25 years ago, when I was a student, a dog is now holding meaningful conversations in English. And people are complaining that the dog isn’t a very eloquent orator, that it often makes grammatical errors and has to start again, that it took heroic effort to train it, and that it’s unclear how much the dog really understands.

Well, yes. What AlphaCode does really is remarkable. But, Aaronson goes on to note in his next paragraph:

It’s not obvious how you go from solving programming contest problems to conquering the human race or whatever, but I feel pretty confident that we’ve now entered a world where “programming” will look different.

And that too. Forget about "conquering the human race or whatever." That's self-indulgent tech-bro nonsense. But changing how code gets written? Sure, that's happening.

The thing is, we're moving into new and unexplored territory – we have been for decades now – and we don't know how to evaluate what we discover. As I said the other day, the space seems to get larger as we move into it. We have no metrics, not even informal ones. So the tendency seems to be that we go binary: Either we trivialize the result (that's nothing) or we exaggerate its significance (the Singularity is coming!).

Well, no, it's not either of those things. It's somewhere in between. Just where, we haven't a clue. I found myself in that situation when GPT-3 came out. I dealt with it by writing 14K words: GPT-3: Waterloo or Rubicon? Here be Dragons. That paper is somewhere between nothing to see here and it's alive! But where? How do we chart this new territory?

Postscript:

Postscript 2: Matt Yglesias has a column that inadvertently illustrates the problem, The case for Terminator analogies. He thinks the AI alignment problem is real, but that Skynet in the Terminator films is not what the theorists of alignment think about. He's right about that. In the course of developing his argument he says:

A lot of smart people like to argue about the timelines here, so as someone who is not so smart, I would just make two observations: (1) we have a clear precedent for very rapid progress in the field of computers, and (2) AI does appear to be progressing very rapidly recently. A French AI beat eight world champions at bridge two weeks ago. Last week, OpenAI released DALL-E-2, which draws images based on natural language input, and Google released PaLM, which seems like a breakthrough in computer reasoning.

There's the problem. Yes, there's been rapid progress – the bridge example is particularly interesting. But how do we calibrate it? This is going to keep cropping up.

One issue is that the idea of progress seems like linear movement along a single dimension. Whatever is going on, it is multidimensional. Perhaps we should start thinking in terms of adding new dimensions to the space rather than simply advancing on one. 

FWIW, back in the 1990s David Hays undertook an extensive review of the (largely empirical) literature on cultural complexity: The Measurement of Cultural Evolution in the Non-Literate World. He started out looking for one variable underlying the various metrics that had been proposed and explored. He concluded that that was not possible, at least not at the time. Instead he identified eleven aspects, as he called them, for socio-cultural organization. In effect, he argued that cultural complexity involves eleven dimensions. He rated each of several hundred societies on those aspects.

No comments:

Post a Comment