Friday, March 21, 2025

What's going on inside so-called "reasoning model" LLMs

Melanie Mitchell has an interesting piece up: Artificial intelligence learns to reason, Science, 20 Mar 2025, Vol 387, Issue 6740, DOI: 10.1126/science.adw5211

After working through an example given to one of these large reasoning models (LRMs), she sketches how they work;

An LRM is built on top of a pretrained “base model,” an LLM such as GPT-4o. In the case of DeepSeek, the base model was their own pretrained LLM called V3. (The naming of AI models can get very confusing.) These base models have been trained on huge amounts of human-generated text, where the training objective is to predict the next token (i.e., word or word-part) in a text sequence.

The base model is then “post-trained”—that is, further trained but with a different objective: to specifically generate chains of thought, such as the one that o1 generated for the “sisters” puzzle. After this special training, when given a problem, the LRM does not generate tokens one at a time but generates entire chains of thought. Such chains of thought can be really long. Unlike, say, GPT- 4o, which generates a relatively small number of tokens, one at a time, when given a problem to solve, models like o1 can generate hundreds to thousands of chain-of-thought steps, sometimes totaling hundreds of thousands of generated tokens making up the chain-of-thought steps (most of which are not revealed to the user). And because customers using these models at a large scale are charged by the token, this can get quite expensive.

Thus, an LRM does substantially more computation than an LLM to generate an answer. This computation might involve generating many different possible chains of thought, using another AI model to rate each one and returning the one with the highest rating, or doing a more sophisticated kind of search through possibilities, akin to the “lookahead” search that chess- or Go-playing programs do to figure out a good move. When using a model such as o1, these computations happen behind the scenes; the user sees only a summary of the chain-of-thought steps generated.

That kind of thing is quite familiar to me. Years ago I had to prepare abstracts for what was then The American Journal of Computational Linguistics (now shorted to Computational Linguistics). To do that I had to read widely in AI and computational linguistics. Many different schemes were created for generating, keeping track of, and selecting over partial results of a computation, such as parsing a sentence. One of the things that killed that intellectual program is that the number of partial results exploded beyond the ability of the hardware to handle it all.

Mitchell then goes on to same more about the post-training regime. She notes that there is considerable debate over whether or not what these systems are doing is "real" reasoning, noting that

“Reasoning” is an umbrella term for many different types of cognitive problem-solving processes; humans use a multiplicity of strategies, including relying on memorized steps, specific heuristics (“rules of thumb”), analogies to past solutions, and sometimes even genuine deductive logic.

She goes on to say:

In LRMs, the term “reasoning” seems to be equated with generating plausible-sounding natural-language steps to solving a problem, and the extent to which this provides general and interpretable problem-solving abilities is still an open question. The performance of these models on math, science, and coding benchmarks is undeniably impressive. However, the overall robustness of their performance remains largely untested, especially for reasoning tasks that, unlike those the models were tested on, don’t have clear answers or cleanly defined solution steps, which is the case for many, if not most, real-world problems, not to mention “ fixing the climate, establishing a space colony, and the discovery of all of physics,” which are achievements OpenAI’s Sam Altman expects from AI in the future.

There's more at the link. The whole piece is worth reading.

No comments:

Post a Comment