Pages in this blog

Thursday, June 23, 2022

The Two Voices of Scott Alexander on Rogue AI

Back at the end of February, Tyler Cowen had a post entitled, “Are nuclear weapons or Rogue AI the more dangerous risk?” He linked to a long Scott Alexander post, “Biological Anchors: A Trick That Might or Might Not Work,” which was about some recent web discourse around and about predicting the emergence of human-level AI. Then, at the very end on his long post, seemingly out of nowhere, Alexander was fretting about danger of rogue AI.

I seem to have gotten trapped in Alexander’s post. I have read it several times, even taking notes. It’s a very interesting document and merits some discussion of how it is constructed. I’m not so much concerned about whether or not Alexander’s assessment of the prospects of human level AI is valid as I am about the convoluted nature of his post.

Two voices

Alexander writes the post in two voices. While I’ve not read a lot of his material, I’ve read enough to know that he’s a careful and skilled writer. If he spoke through two voices it must be because whatever he wants to convey arises from the interaction between them and cannot be stated within a single voice.

Let’s call one of the voices the Impersonal voice. Most of the post is written in that voice, which is the voice in which he’s written most of the posts I’m familiar with. Let’s call the other voice the Personal voice. By word-count it’s by far the lesser voice, but it packs a strong rhetorical punch.

Let’s look at the two strongest statements from the Personal voice. The first, and I believe longest, section of the report is Alexander’s account of Ajeya Cotra’s long report, Forecasting TAI with biological anchors, which I’ve not read (though I’ve read some of what Holden Karnofsky says about it). Very near the end of this section the Personal voice makes a strong statement (though this is not the first appearance of this voice):

One more question: what if this is all bullshit? What if it’s an utterly useless total garbage steaming pile of grade A crap?

Our second example comes near the end of the post, when Alexander begins his own assessment of things:

Oh God, I have to write some kind of conclusion to this post, in some way that suggests I have an opinion, or that I’m at all qualified to assess this kind of research. Oh God oh God.

Phrases like “total garbage,” “grade A crap” and “Oh God” are not appropriate to the work Alexander is doing through his (standard and) Impersonal voice. They signal us that we are listening to a different voice. This voice expresses a merely personal attitude and is quite different from the objectivity sought in the Impersonal voice.

Taken at face value the second quoted statement says Alexander doesn’t feel (technically) qualified to judge this material. As such, it also tells us why he’d made that first statement and in that voice. That first statement places the assertion, Cotra’s report is nonsense, into the record. By couching that assertion in the words and manner of the Personal voice Alexander separates it from his Impersonal summary of the report. In effect, The guy who summarized the report is not the guy who thinks it’s nonsense. The guy who summarized the report doesn’t feel competent to assess it, but happens to be closely coupled to the guy who has deep doubts.

From parody to grudging affirmation to confusion

So, Alexander has gotten a statement of deep doubt into the record. What happens next? He goes back into the Impersonal voice and invites us to

Imagine a scientist in Victorian Britain, speculating on when humankind might invent ships that travel through space. He finds a natural anchor: the moon travels through space! He can observe things about the moon: for example, it is 220 miles in diameter (give or take an order of magnitude). So when humankind invents ships that are 220 miles in diameter, they can travel through space!

He then spins out that tale and includes a helpful chart and a picture. It’s absurd – he does slip in a wink or two. It’s a parody of the methodology in Cotra’s report. A parody is not an argument, but it clarifies Alexander’s fears about the report.

Then he goes into his second major section, an account of Eliezer Yudkowsky’s critique, which takes the form of a long post in dialogue form. I’ve taken a look at the post, but haven’t read the whole thing. Alexander’s opens this section by asserting, “Eliezer Yudkowsky presents a more subtle version of these kinds of objection in an essay…” I won’t bother to say anything about that beyond noting the Yudkowsky thinks Cotra’s method is useless for estimating the arrival of Transformative AI. Alexander may not feel qualified to critique Cotra’s work, but Yudkowsky certainly does. And why not? As Alexander notes: “...he did found the field [AI alignment], so I guess everyone has to listen to him.”[1]

When he’s finished with Yudkowsky, Alexander discusses comments from other places (LessWrong, AI Impacts, and OpenPhil) and finally offers his own evaluation, which he opens with the “Oh God” statement I’ve already quoted. He says a thing or two and arrives at this:

Given these two assumptions - that natural artifacts usually have efficiencies within a few OOM [orders of magnitude] of artificial ones, and that compute drives progress pretty reliably - I am proud to be able to give Ajeya’s report the coveted honor of “I do not make an update of literally zero upon reading it”.

That still leaves the question of “how much of an update do I make?” Also “what are we even doing here?”

I take it that the passages he puts in quotes are being spoken though the Personal voice.

Let’s look at the first one. It’s stated in informal Baysian terms. It also feels arch and indirect. “I do not make an update of literally zero”? What’s that? He’s refrained from writing “0” in a ledger somewhere? Whereas if the report had been less convincing, he’d have

  • opened up that ledger,
  • added a line,
  • placed “Ajeya’s report” in the Argument column, and
  • written “0” in the Effect on Me column.

On the contrary, the report has had some non-zero effect on him. But then he attempts to run away: “what are we even doing here?”

A couple paragraphs later: “This report was insufficiently different from what I already believed for me to need to worry about updating from one to the other.” I’m not sure what to make of this. If his prior belief had been quite different from the report’s conclusion, then a decision to stick with that prior belief would represent lack of faith in the report (which he doesn’t feel competent to judge). In contrast, a decision to revise his belief in the direction indicated by the report would represent faith in Cotra (and her colleagues) despite his lack of (technical) qualifications for judging the report. So, he can’t judge the report, doesn’t think it’s wrong, but doesn’t think it’s right enough to lead him to change his mind.

It's as though he’s playing the role of Penelope in Odyssey. She tells her suitors she’ll pick one when she’s done weaving a burial shroud for Laertes. They see her diligently weaving during the day. But then at night, what does she do? She undoes the weaving she’d done during the day so that she can put off the day she has to pick one.

“I’m already scared”

And now, at long last, Alexander gets around to what was obviously on his mind from the very beginning, fear of a rogue AI. He‘s very little that up to this point, which is strange in itself, but now he comes out with it.

Alexander circles back to Yudkowsky: “The more interesting question, then, is whether I should update towards Eliezer’s slightly different distribution, which places more probability mass on earlier decades.” Yudkowsky, however, refuses to give dates: “I consider naming particular years to be a cognitively harmful sort of activity...” Incidentally, sounds like self-regarding grandstanding from Yudkowsky. Perhaps, as people surround him asking for the date, he passes out gilded fortune cookies as souvenirs.

Alexander:

So, should I update from my current distribution towards a black box with “EARLY” scrawled on it?

What would change if I did? I’d get scared? I’m already scared. I’d get even more scared? Seems bad.

That’s not the end. We’ve got two or three more paragraphs. But those paragraphs don’t change the fact that Alexander is scared. They just mix a bit of wit into the contemplation of doom.

Alexander and his community

What are we to make of all this?

I don’t quite know. But then neither does Alexander.

I note, however, that he wrote that post for a community he’s been cultivating for almost a decade. He’s written on a wide variety of topics. He’s conducted surveys, had book review contests, and interacted with that community in various says. That long ambivalent post, spoken through two voices, that’s the post he felt he owed that community. To what extent is Alexander’s ambivalence a reflection of attitudes in that community?

On the one hand there’s the apparent assumption that Transformative AI is on the way come hell and high water. That is coupled with interest in the arcane technical minutiae and leaps of epistemic faith required to issue a long report predicting that arrival by comparison with biological information processing in 1) the human brain, 2) a human life, 3) the evolution of life on earth, and 4) the genome, all measured in FLOPS (floating-point operations per second).[2] That’s one thing.

And there there’s the correlative assumption, perhaps not shared by all, but nonetheless widespread, that the arrival of Transformative AI brings with it the danger that that AI will turn against humanity and transform the earth into a paperclip factory, metaphorically speaking. To the extent that the rhetorical structure of his post is responding to what? a facture, ambivalence? in his audience, why does Alexander hold it in reserve until the end, like it is a shameful secret?

Notes

[1] That depends on what one thinks of the field of AI alignment. Color me skeptical. The idea that we are under not-so-distant threat from an AI hell-bent on our destruction strikes me as conspiracy theorizing directed at technology no one knows how to build.

[2] In his history of technology, David Hays tells us that

... the wheel was used for ritual over many years before it was put to use in war and, still later, work. The motivation for improvement of astronomical instruments in the late Middle Ages was to obtain measurements accurate enough for astrology. Critics wrote that even if the dubious doctrines of astrology were valid, the measurements were not close enough for their predictions to be meaningful. So they set out to make their instruments better, and all kinds of instrumentation followed from this beginning.

I feel a bit like that about the topics of Ajeya Cotra’s report. They are interesting and important in themselves for what they tell us about the world. The need not be yoked to the task of predicting future technology.

No comments:

Post a Comment