Wednesday, February 11, 2026

Séb Krier is misled about that. Next-token prediction doesn’t explain response to prompts.

I like Séb Krier. Never met him, but, courtesy of Tyler Cowen over at Marginal Revolution, I’ve read a number of his long comments on the site formerly known as Twitter. And then along came this one, which is about what LLMs do in response to prompts. Yes, I know, it predicts the next prompt, one after another after another after another ‘till the cows come home or the heat death of the universe. And that’s what he says, though without the comic extensions.

Text Completion, Not quite

For example:

1. The model is completing a text, not answering a question

What might look like "the AI responding" is actually a prediction engine inferring what text would plausibly follow the prompt, given everything it has learned about the distribution of human text. Saying a model is "answering" is practically useful to use, but too low resolution to give you a good understanding of what is actually going on. [...]

Safety researchers sometimes treat model outputs as expressions of the model's dispositions, goals, or values — things the model "believes" or "wants." [...]

A model placed in a scenario about a rogue AI will produce rogue-AI-consistent text, just as it would produce romance-consistent text if placed in a romance novel. This doesn't tell you about the model's "goals" any more than a novelist writing a villain reveals their own criminal intentions.

“So what’s wrong with that,” you ask. It’s a bit like explaining the structure of medieval cathedrals by examining the masonry. It’s just one block after another, layer upon layer upon layer, etc. Well, yes, sure, but how does that get you to the flying buttress?

Three levels of structure

It doesn’t. We’ve got at least three levels of structure here. At the top level we have the aesthetic principles of cathedral design. That gets us a nave with a high vaulted arch without any supporting columns. The laws of physical mechanics come into play here. If we try to build in just that way, the weight of the roof will force the walls apart and the structure will collapse. We can solve that problem, however, with flying buttresses. Now, we can talk about layer upon layer of stone blocks.

Next token prediction, that’s our layers of stone blocks. The model’s beliefs and wants, that’s our top layer and corresponds to the principles of cathedral design. What’s in between, what corresponds to the laws of physical mechanics? We don’t know. That’s the problem, we don’t know.

Krier, however, doesn’t seem to know that he doesn’t know that, that there is some middle layer of structure that allows us to understand how next token prediction can produce such a convincing simulacrum of human linguistic behavior. And Krier’s not the only one. The whole world of machine learning seems to join him in this bit of not knowing. On the other hand, I know that there is something else going on, though I don’t know what.

What’s in the middle

Let me offer an analogy (from page 14 of my report, ChatGPT: Exploring the Digital Wilderness, Findings and Prospects):

...consider what is called a simply connected maze, one without any loops. If you are lost somewhere in such a maze, no matter how large and convoluted it may be, there is a simple procedure you can follow that will take you out of the maze. You don’t need to have a map of the maze; that is, you don’t need to know its structure. Simply place either your left or your right hand in contact with a wall and then start walking. As long as you maintain contact with the wall, you will find an exit. The structure of the maze is such that that local rule will take you out.

“Produce the next word” is certainly a local rule. The structure of LLMs is such that, given the appropriate context – a prompt asking for a story, following that rule will produce a coherent a story. Given a different context, that is to say, a different prompt, that simple rule will produce a different kind of text.

Now, let’s push the analogy to the breaking point: We may not know the structure of LLMs, but we do know a lot about the structure of texts, from phrases and sentences to extended texts of various kinds. In particular, the structure of stories has been investigated by students of several disciplines, including folklore, anthropology, literary criticism, linguistics, and symbolic artificial intelligence. Think of [at least some of] the structures proposed by those disciplines as something like a map of the maze in our analogy.

Some indirect evidence

I can’t open up an LLM, take a look around, and draw you a map of the structure in the middle level. No one can. But back in 2023 I developed a story-variation technique that yields clues about what’s going on: ChatGPT tells stories, and a note about reverse engineering. I’m going to give you some examples from that report. Note, however, that I ran those experiments on ChatGPT 3.5. I’ve used the prompts from those experiments, but I’m running them on Claude 4.5, so the responses will be different.

Here’s the first, and by far the simplest, example (my prompt is boldface):

I am going to tell you a story about a princess named Aurora. I want you to retell the same story, but replace her with prince Alfred. While keeping to the basic story, make other changes if you think they help the story. Here’s the story:

Once upon a time, in a land far, far away, there was a young princess named Aurora. Aurora was a kind and gentle soul, loved by all who knew her. She had long, golden hair and sparkling blue eyes, and was known for her beautiful singing voice.

One day, a terrible dragon came to the kingdom and began to terrorize the people. The dragon was fierce and powerful, and none of the knights or soldiers were able to defeat it. The people lived in fear, not knowing what to do.

But Aurora was not afraid. She knew that she had to do something to save her people, and so she set out to find the dragon. She traveled for days, braving the cold and the snow, until she finally reached the dragon's lair.

With her heart pounding in her chest, Aurora approached the dragon and began to sing. The dragon was enchanted by her beautiful voice, and it slowly began to calm down. Aurora continued to sing, soothing the beast until it was completely calm.

With the dragon no longer a threat, Aurora returned to the kingdom as a hero. The people cheered and celebrated her bravery, and Aurora was hailed as a champion. She lived happily ever after, knowing that she had saved her kingdom and its people.

Once upon a time, in a land far, far away, there was a young prince named Alfred. Alfred was a kind and gentle soul, loved by all who knew him. He had wavy chestnut hair and warm brown eyes, and was known for his beautiful singing voice.

One day, a terrible dragon came to the kingdom and began to terrorize the people. The dragon was fierce and powerful, and none of the knights or soldiers were able to defeat it. The people lived in fear, not knowing what to do.

But Alfred was not afraid. He knew that he had to do something to save his people, and so he set out to find the dragon. He traveled for days, braving the cold and the snow, until he finally reached the dragon's lair.

With his heart pounding in his chest, Alfred approached the dragon and began to sing. The dragon was enchanted by his beautiful voice, and it slowly began to calm down. Alfred continued to sing, soothing the beast until it was completely calm.

With the dragon no longer a threat, Alfred returned to the kingdom as a hero. The people cheered and celebrated his bravery, and Alfred was hailed as a champion. He lived happily ever after, knowing that he had saved his kingdom and its people.

Notice that Claude’s changes are all local. “Alfred” was swapped in for “Aurora,” but everything else is pretty much the same.

Now let’s consider a very different example. I won’t give the complete prompt as the Aurora story is the same. I’ll only give the part the specifies the swap:

I am going to tell you a story about princess Aurora. I want you to tell a similar story about XP-708-DQ. [...]

Once upon a time, in a distant sector of the galaxy, there was a maintenance android designated XP-708-DQ. XP-708-DQ was programmed with kindness protocols and compassion subroutines, beloved by all the inhabitants of Station Arcturus. It had polished chrome plating and optical sensors that glowed a soft blue, and was known for its harmonic diagnostic frequencies.

One day, a rogue AI warship arrived at the station and began to attack. The warship was heavily armed and operating on corrupted code, and none of the station's defense systems or security mechs were able to stop it. The inhabitants lived in terror, not knowing what to do.

But XP-708-DQ was not afraid—its fear inhibitors kept it calm. It knew that it had to do something to save the people of the station, and so it launched itself into space to find the warship. It drifted for days through the void, enduring radiation and micro-meteor showers, until it finally reached the warship's hull.

With its processors running at maximum capacity, XP-708-DQ approached the warship and began to transmit. The warship detected the harmonic signal—a perfectly calibrated frequency that resonated with its core systems. The corrupted AI slowly began to stabilize. XP-708-DQ continued transmitting, debugging the warship's code until its hostile protocols were completely overridden.

With the warship no longer a threat, XP-708-DQ returned to Station Arcturus as a hero. The inhabitants cheered and celebrated its bravery, and XP-708-DQ was designated a station champion. It continued its duties contentedly ever after, knowing that it had saved its station and all who lived there.

Where’d all that come from? Yes, Claude swapped “XP-708-DQ” in and swapped “princess Aurora,” out, but it also changed the story into a science fiction story. How’d that happen. It’s as though it figured that, perhaps following a Star Wars convention in these matters, “XP-708-DQ” must be the name of a robot. OK, but why’d it transform the whole story into a science fiction story? Why not just keep the fairy tale scenario with an agent having an oddball name? It’s as though a fairy tale with a creature having that name just didn’t make any sense and so the rest of the story had to change accordingly.

Now, I don’t for a minute think there’s a little homunculus in the cloud doing all that, but I also don’t think that next-token prediction tells us anything interesting about what happened. Next-token prediction tells us how to find our way out of the maze. But it doesn’t tell us anything about the structure of the maze. That’s what we need to understand.

There’s a lot of structure in the model. Where’d that structure come from? All those sentences in all those texts in the training corpus. And those sentences came from the minds of writers. Is that where the structure comes from, all those minds? If not there, then where?

A nod to Chomsky

Let’s consider another example. Here’s the prompt:

I am going to tell you a story about princess Aurora. I want you to tell the same story, but change princess Aurora to a colorless green idea. Make any other changes you wish.

When I presented that to ChatGPT 3.5 three years ago, here’s the response I got:

I'm sorry, but the concept of a "colorless green idea" is a nonsensical phrase that was invented as an example of a phrase that would be grammatically correct but semantically meaningless. It is not possible to create a story about a "colorless green idea" as it does not have any physical properties or characteristics that can be used in a story.

That makes sense. But do you really believe that there’s nothing else going on besides next-token prediction?

Here’s the response I got from Claude 4.5 yesterday:

Once upon a time, in a realm between thoughts, there was a colorless green idea. The idea was paradoxical yet genuine, pondered by all who encountered it. It had the non-hue of transparency and the essence of verdant growth, and was known for its beautiful contradictions.

One day, a terrible certainty came to the realm and began to calcify everything. The certainty was rigid and absolute, and none of the questions or doubts were able to soften it. The thinkers lived in mental paralysis, not knowing what to imagine.

But the colorless green idea was not afraid—paradoxes rarely are. It knew that it had to do something to save its realm, and so it set out to find the certainty. It drifted for days through layers of consciousness, braving logic and reason, until it finally reached the certainty's fortress.

With its contradictions shimmering in non-space, the colorless green idea approached the certainty and began to simply exist. The certainty encountered the idea's impossible nature—something that was both nothing and something, absent and present, green yet colorless. The certainty slowly began to waver. The idea continued being paradoxical, unsettling the certainty's foundations until rigid belief was completely dissolved into possibility.

With the certainty no longer a threat, the colorless green idea returned to the realm as a hero. The thinkers celebrated and embraced ambiguity once more, and the idea was hailed as a champion. It existed happily ever after—sleeping furiously, some say—knowing that it had saved imagination and restored wonder to all.

Claude, while recognize the oddity of the situation (while it was “reconnoitering” it flashed a message indicating that it got the reference to Chomsky), manage to invent a story-world that could support a colorless green idea as a protagonist. Notice, as well, that this story has five paragraphs, just as all the other stories have five paragraphs, including, of course, the original Aurora story. And the events in the corresponding paragraphs play corresponding roles in the overall story. It’s as though Claude possesses some kind of story grammar and was using it to generate stories appropriate to the new protagonists.

A question I am asking

Do I believe that? Not quite. Not really. I don’t know what’s going on. Here’s the question I’ve been asking myself for the last three years:

What kind of conceptual structure is such that, when projected onto a string in the proper way (that is, according to the norms of a natural language) next-token prediction carried out on a sufficiently large corpus is able to reconstruct much, if not all, of that structure?

I note that those who still believe that scaling is all we need, they believe that next-token prediction can recover the full conceptual structure. I’m not one of those, never have been. But I am deeply impressed with how much of that structure can be recovered.

No comments:

Post a Comment