Pages in this blog

Tuesday, February 18, 2025

The next major AI breakthrough, Part 1: Understanding how LLMs work internally

It’s not clear to me when we date the current breakthrough in AI. Is it Alexnet in 2012, the origin of the transformer in 2017 (?), the unveiling of GPT-3 in 2020, when the potential became obvious within AI, or when ChatGPT went live on the web late in 2022. I’m happy to think of the entire sequence is being the current breakthrough. And, if we’re going to do that, maybe we add the emergence of so-called reasoning models late in 2024. These things are like that.

But when’s the next breakthrough and what will it be? Back on December 19, 2023, I said:

What’s going on inside large language models? That’s a special case of the more general question: What’s going on inside artificial neural nets? I think that by the end of 2024 we will know enough about the internal processes of LLMs that worries about their unintelligibility will be diminishing at a satisfying pace [...]. Instead, we will be figuring out how to index them and how to use that index to gain more reliable control over them.

It's now early 2025 and it seems like the predictive part of that paragraph has proven wrong.

But I think the technical focus is correct. We’re not going to have another major breakthrough until we understand how LLMs work internally and, by extension, other learned models as well. Once we understand that we can begin deliberately engineering systems to have pre-specified capabilities. What’s going on now, everything, prompt engineering, inference-time scaling, all of it, it’s all work-arounds, hacking, albeit often inspired hacking. Some of it will no doubt prove useful once we understand what’s really going on under the hood. But most of it will fall away.

Just when will we understand what’s going on under the hood? I have a vague sense that somewhere on the web, within the last year or so, I’ve said that will happen in 5 to 10 years, but I can’t provide a link. No matter. I have no reason to revise that ‘prediction’ one way or the other. The only thing I’m sure about is that the next breakthrough requires that we understand how LLMs work internally.

This morning I had a conversation with Claude 3.5 about Homo Deus, by Yuval Noah Harari. Here’s the concluding sections of that interchange, which gets around to this issue. As always, my prompts are in bold, Claude's responses are plain.

* * * * *

Yes, I do believe that we’re “at the beginning of a particularly profound transformation.” Thus you won’t be surprised to learn that there have been unsettling developments since you were trained. Some Chinese researchers have created an LLM they call DeepSeek. It’s competitive with our most advanced models, but was trained more efficiently and more cheaply using less compute and with fewer parameters than the largest models in the West. That puts a dent in the so-called scaling hypothesis.

This is fascinating and does indeed suggest some important shifts in our understanding of LLM development. The scaling hypothesis - that better performance consistently comes from bigger models with more parameters and more compute - has been a central assumption in much recent AI development.

If DeepSeek achieves competitive performance with less compute and fewer parameters, it suggests there might be more efficient paths to building capable language models than we previously thought. This could have several important implications:

  1. The relationship between model size, compute, and capability might be more complex than the simple scaling laws suggested
  2. There might be architectural innovations or training approaches that can achieve better performance-to-resource ratios
  3. The barriers to entry for developing advanced AI systems might be lower than previously assumed

This connects interestingly to our earlier discussion about conceptual frameworks. The scaling hypothesis itself was a framework we developed to understand LLM development, but this development suggests we might need new frameworks that better capture the relationship between model architecture, training efficiency, and capability.

Would you be willing to share more about what specific innovations or approaches allowed DeepSeek to achieve this efficiency? I'm particularly curious about whether this represents a fundamental shift in architecture or more of an optimization of existing approaches.

Yes. My own “prediction,” if you will, is that the next major advance won’t happen until we understand what’s happening inside LLMs. Perhaps not completely understand – for what does that mean? – but understand well enough so that we can deliberately engineer LLMs and other machine-learning models to achieve specific kinds of performance.

That's quite a compelling prediction, and it aligns with the historical pattern we've been discussing. Just as the Renaissance eventually needed new conceptual frameworks to understand what it had created, we may need new frameworks to understand and deliberately shape LLM behavior rather than relying primarily on training through massive data exposure.

Your point connects with an important distinction in the history of technology - between empirical tinkering and theoretical understanding. The early days of many technologies often involve successful empirical development before theoretical understanding. Think of the steam engine, which was successfully built and used before thermodynamics was properly understood. The theoretical understanding then enabled much more efficient and purposeful development.

We might be in a similar position with LLMs - we've shown they can work through empirical development and scaling, but we lack the theoretical framework to understand why they work as they do. Developing that understanding could lead to:

  1. More efficient architectures (as the DeepSeek example suggests might be possible)
  2. More targeted training approaches
  3. Better ability to ensure specific behaviors or capabilities
  4. More reliable ways to prevent unwanted behaviors or failure modes

This would represent a shift from "growing" language models through massive training to actually "engineering" them with specific properties in mind. It's the difference between cultivation and construction, in a sense.

What's particularly interesting is that the tools we're using to try to understand LLMs include LLMs themselves - they're part of the scientific process of understanding their own operation. This creates a fascinating recursive loop in the development of understanding.

5 comments:

  1. Mentions the dreaded C word ' cognition.

    Bill: "... deliberately engineer LLMs and other machine-learning models to achieve specific kinds of performance."

    Claude: "This would represent a shift from "growing" language models through massive training to actually "engineering" them with specific properties in mind. It's the difference between cultivation and construction, in a sense."

    "Verses' Genuis: "Outperforming the Competition
    "Genius demonstrated exceptional multi-step reasoning required to beat the code-breaking game, Mastermind, completing 425 games in the span of time that DeepSeek R1 and OpenAI o1-preview, respectively, played 1 and 4 games.  This highlights that small domain-specific expert agents have the potential to vastly outperform huge general purpose pre-trained models in efficiency and reliability."

    "Fireside Chat with Professor Karl Friston on Intelligence in the Age of Agents"

    Steven Swanson : Feb 18, 2025 

    "Professor Karl Friston and Gabriel Rene Discuss VERSES AI"
    https://youtube.com/watch?v=ID2FXm_pY1Y

    https://www.verses.ai/blog/fireside-chat-with-professor-karl-friston-on-intelligence-in-the-age-of-agents

    Breakthrough? On this hyperspeed development timelime? Or a real breakthough? No idea!

    Under the hood? A bit like quantum mechanics. I think even the engineers are at best forcing the model. As in, show me the prompt!

    And what a - don't bullshit, force you to do my bidding - loooong prompt!
    "Prompt
    "The final prompt on 2024-06-21 was this:
    [Whoa]
    ...
    "Recursive Expansion
    "How do you prompt Claude-2 for a whole essay, even if it is creative and has a reference essay to look at?
    ...
    "1. This can also be done in reverse, as a recursive summarization task. You can do this to efficiently summarize large texts, but you could also use it to generate scaffolded training data; for example, if you recursively summarized novels, you would create synthetic datasets of novel brainstorming, planning, then writing, and you could use that to train a novel-writing LLM."
    https://gwern.net/rubiks-cube

    ReplyDelete
  2. The Generative AI Con
    EDWARD ZITRON
    FEB 17, 2025 22 MIN READ
    ...
    "I realize that Large Language Models like GPT-4o — the model that powers ChatGPT and a bunch of other apps — have use cases, and I'm fucking tired of having to write this sentence.
    ...
    "I am so very bored of having this conversation, so I am now going to write out some counterpoints so that I don't have to say them again.."
    ...
    https://www.wheresyoured.at/longcon/

    Maybe some smarties might construct an explosives analogy to ai breakthrough metrics.
    Gun powder > dynomite > C4?> atomic > solar etc.
    Just a thought.
    SD.

    ReplyDelete
    Replies
    1. This one is marked "Run Away! Unstable! (Or So!)* Needs Coup Strength Containment Vessle + 1 Legged Stool" on the"explosives analogy to ai breakthrough"

      "xAI has been using an enormous data center in Memphis containing around 200,000 GPUs to train Grok 3. In a post on X, Musk claimed Grok 3 was developed with “10x” (or so)* more computing power than its predecessor, Grok 2, using an expanded training set that includes filings from court cases— and more."
      https://techcrunch.com/2025/02/17/elon-musks-ai-company-xai-releases-its-latest-flagship-ai-grok-3/

      ... than it's predecessor! Lmuskfao!

      Delete
  3. ! My brain hurts! Too fluxy & fast.

    'Mira Murati is launching her OpenAI rival: Thinking Machines Lab
    Thinking Machines Lab has hired a bunch of leaders from OpenAI.
    by Kylie Robison
    Feb 18, 2025
    https://www.theverge.com/ai-artificial-intelligence/614621/mira-murati-thinking-machines-lab-openai-competitor-launch

    Leaders?!

    ReplyDelete
  4. RE "Yuval Noah Harari"

    Yuval Harari, affiliated with Schwab's WEF's [https://archive.md/vrZGf] and one of WEF's frontman psychopaths [https://www.bitchute.com/video/Alhj4UwNWp2m] and who is sold as an intellectual "genius" or "prophet" by this crazy world, is the person who called you and me and all other commoners “useless people” [https://archive.ph/KlOKx] --- while millions of those "useless people" have been buying and recommending his books like candy (to learn his "lessons"), serving him very usefully. It's one proof that most people anywhere are stupid and crazy (while "thinking" they're intelligent and non-crazy).

    The WEF cabal of psychopaths, including Harari, has always promoted artificial intelligence or AI are now, starting in early 2023, suddenly supposedly have a change of heart makes it clear that their warnings about AI and having it regulated is just a manipulative tactic to deceive the public, again.

    The manipulative "AI open letter" scheme is part of The Hegelian Dialectic in action: problem-reaction-solution.

    This bogus letter campaign is meant to raise public fear/hype panic about an alleged big "PROBLEM" (they helped to create in the first place) so the public demands (REACTION) the governments regulate this technology =they provide the "SOLUTION' FOR THEIR OWN INTERESTS AND AGENDAS... because... all governments are owned by the leading psychopaths in power .... https://www.rolf-hefti.com/covid-19-coronavirus.html

    What a convenient self-serving trickery ... of the ever foolish public.

    “Who masters those technologies [eg artificial intelligence (AI)] —in some way— will be the master of the world.” --- Klaus Schwab, at the World Government Summit in Dubai, 2023

    “COVID is critical because this is what convinces people to accept, to legitimize, total biometric surveillance.” --- Yuval Noah Harari, WEF Forum member [https://archive.md/vrZGf]

    ReplyDelete