I like Séb Krier. Never met him, but, courtesy of Tyler Cowen over at Marginal Revolution, I’ve read a number of his long comments on the site formerly known as Twitter. I liked them. And then along came this one, which is about what LLMs do in response to prompts. Yes, I know, it predicts the next token, one after another after another after another ‘till the cows come home or the heat death of the universe. That’s the conventional wisdom. And that’s what he says, though without the comic extensions. However, on this I'm afraid the convention wisdom doesn't know what it doesn't know.
Text Completion, Not quite
For example:
1. The model is completing a text, not answering a question
What might look like "the AI responding" is actually a prediction engine inferring what text would plausibly follow the prompt, given everything it has learned about the distribution of human text. Saying a model is "answering" is practically useful to use, but too low resolution to give you a good understanding of what is actually going on. [...]
Safety researchers sometimes treat model outputs as expressions of the model's dispositions, goals, or values — things the model "believes" or "wants." [...]
A model placed in a scenario about a rogue AI will produce rogue-AI-consistent text, just as it would produce romance-consistent text if placed in a romance novel. This doesn't tell you about the model's "goals" any more than a novelist writing a villain reveals their own criminal intentions.
“So what’s wrong with that,” you ask. It’s a bit like explaining the structure of medieval cathedrals by examining the masonry. It’s just one block after another, layer upon layer upon layer, etc. Well, yes, sure, but how does that get you to the flying buttress?
Three levels of structure
It doesn’t. We’ve got at least three levels of structure here. At the top level we have the aesthetic principles of cathedral design. That gets us a nave with a high vaulted arch without any supporting columns. The laws of physical mechanics come into play here. If we try to build in just that way, the weight of the roof will force the walls apart and the structure will collapse. We can solve that problem, however, with flying buttresses. Now, we can talk about layer upon layer of stone blocks.
Next token prediction, that’s our layers of stone blocks. The model’s beliefs and wants, that’s our top layer and corresponds to the principles of cathedral design. What’s in between, what corresponds to the laws of physical mechanics? We don’t know. That’s the problem, we don’t know.
Krier, however, doesn’t seem to know that he doesn’t know that, that there is some middle layer of structure that allows us to understand how next token prediction can produce such a convincing simulacrum of human linguistic behavior. And Krier’s not the only one. The whole world of machine learning seems to join him in this bit of not knowing. There really is something else going on, though I don’t know what.
What’s in the middle
Let me offer an analogy (from page 14 of my report, ChatGPT: Exploring the Digital Wilderness, Findings and Prospects):
...consider what is called a simply connected maze, one without any loops. If you are lost somewhere in such a maze, no matter how large and convoluted it may be, there is a simple procedure you can follow that will take you out of the maze. You don’t need to have a map of the maze; that is, you don’t need to know its structure. Simply place either your left or your right hand in contact with a wall and then start walking. As long as you maintain contact with the wall, you will find an exit. The structure of the maze is such that that local rule will take you out.
“Produce the next word” is certainly a local rule. The structure of LLMs is such that, given the appropriate context – a prompt asking for a story, following that rule will produce a coherent a story. Given a different context, that is to say, a different prompt, that simple rule will produce a different kind of text.
Now, let’s push the analogy to the breaking point: We may not know the structure of LLMs, but we do know a lot about the structure of texts, from phrases and sentences to extended texts of various kinds. In particular, the structure of stories has been investigated by students of several disciplines, including folklore, anthropology, literary criticism, linguistics, and symbolic artificial intelligence. Think of [at least some of] the structures proposed by those disciplines as something like a map of the maze in our analogy.
Some indirect evidence
I can’t open up an LLM, take a look around, and draw you a map of the structure in the middle level. No one can. But back in 2023 I developed a story-variation technique that yields clues about what’s going on: ChatGPT tells stories, and a note about reverse engineering. I’m going to give you some examples from that report. Note, however, that I ran those experiments on ChatGPT 3.5. I’ve used the prompts from those experiments, but I’m running them on Claude 4.5, so the responses will be different.
Here’s the first, and by far the simplest, example (my prompt is boldface):
I am going to tell you a story about a princess named Aurora. I want you to retell the same story, but replace her with prince Alfred. While keeping to the basic story, make other changes if you think they help the story. Here’s the story:
Once upon a time, in a land far, far away, there was a young princess named Aurora. Aurora was a kind and gentle soul, loved by all who knew her. She had long, golden hair and sparkling blue eyes, and was known for her beautiful singing voice.
One day, a terrible dragon came to the kingdom and began to terrorize the people. The dragon was fierce and powerful, and none of the knights or soldiers were able to defeat it. The people lived in fear, not knowing what to do.
But Aurora was not afraid. She knew that she had to do something to save her people, and so she set out to find the dragon. She traveled for days, braving the cold and the snow, until she finally reached the dragon's lair.
With her heart pounding in her chest, Aurora approached the dragon and began to sing. The dragon was enchanted by her beautiful voice, and it slowly began to calm down. Aurora continued to sing, soothing the beast until it was completely calm.
With the dragon no longer a threat, Aurora returned to the kingdom as a hero. The people cheered and celebrated her bravery, and Aurora was hailed as a champion. She lived happily ever after, knowing that she had saved her kingdom and its people.
Once upon a time, in a land far, far away, there was a young prince named Alfred. Alfred was a kind and gentle soul, loved by all who knew him. He had wavy chestnut hair and warm brown eyes, and was known for his beautiful singing voice.
One day, a terrible dragon came to the kingdom and began to terrorize the people. The dragon was fierce and powerful, and none of the knights or soldiers were able to defeat it. The people lived in fear, not knowing what to do.
But Alfred was not afraid. He knew that he had to do something to save his people, and so he set out to find the dragon. He traveled for days, braving the cold and the snow, until he finally reached the dragon's lair.
With his heart pounding in his chest, Alfred approached the dragon and began to sing. The dragon was enchanted by his beautiful voice, and it slowly began to calm down. Alfred continued to sing, soothing the beast until it was completely calm.
With the dragon no longer a threat, Alfred returned to the kingdom as a hero. The people cheered and celebrated his bravery, and Alfred was hailed as a champion. He lived happily ever after, knowing that he had saved his kingdom and its people.
Notice that Claude’s changes are all local. “Alfred” was swapped in for “Aurora,” but everything else is pretty much the same.
Now let’s consider a very different example. I won’t give the complete prompt as the Aurora story is the same. I’ll only give the part the specifies the swap:
I am going to tell you a story about princess Aurora. I want you to tell a similar story about XP-708-DQ. [...]
Once upon a time, in a distant sector of the galaxy, there was a maintenance android designated XP-708-DQ. XP-708-DQ was programmed with kindness protocols and compassion subroutines, beloved by all the inhabitants of Station Arcturus. It had polished chrome plating and optical sensors that glowed a soft blue, and was known for its harmonic diagnostic frequencies.















