Monday, June 1, 2026

Surprise! Why it was so easy for executives and VCs to hijack the AI revolution

This is a fragment from a longer post in my ongoing commentary on Tyler Cowen's recent monograph on the Marginal Revolution.

* * * * *

OpenAI released GPT-3 in 2020 to a limited audience of insiders, who recognized that it represented a breakthrough. This level of performance came as a surprise. No one predicted it. GPT-3 was scaled up from GPT-2, which was in turn scaled up from GPT-1, but no one was making explicit predictions about the level of performance to be achieved at each step. These were experiments: “Let’s try it and see what happens.”That’s fine. That’s a good way to make progress, to try things out and see what happens. But don’t mistake a lucky trial for genuine knowledge.

Cowen mentioned GPT-3 on Marginal Revolution on July 19, and then published a Bloomberg column on it on July 21, which he excerpted in Marginal Revolution the next day: “...think of GPT-3 as giving computers a facility with words that they have had with numbers for a long time, and with images since about 2012.” I published a working paper in August, GPT-3: Waterloo or Rubicon? Here be Dragons, which I’ll discuss a bit later.

Two and a half years later, in November of 2022, OpenAI released ChatGPT to the general public. It spread like wildfire. Now the proverbial everyone witnessed what only a small group had witnessed in the summer of 2020. The machine speaks. Sorta’. But more convincingly than any machine had spoken before and in a way that had unimaginable implications for the future.

A threshold HAS been crossed, but it is not, so far as I can see, a threshold in understanding. It is a threshold in performance along a continuous line of scientific understanding and engineering design and construction, something I have documented in some detail in a recent working paper, The Origins of LLMs. As far as I can tell, there has been no paradigm shift, in Thomas Kuhn’s sense, no rank shift, in terms of cognitive rank theory. There were no fundamentally new ideas in the world by, say, late July of 2020 as a consequence consolidating GPT-3 and making it available in limited release.

“What about the scaling hypothesis,” you might ask. “Isn’t that new?” Perhaps the phrase is new, but the idea was there in Rich Sutton’s famous 2019 essay, The Bitter Lesson. Given the nature of computing, scaling up is not trivial. Hundreds if not thousands of technical details need to worked out as the size of the training corpus increases by factors of 10 or more, time after time, and as more and more GPUs are ganged together to assemble the computing power needed. But there has been no gain in fundamental understanding, not of machine learning, artificial neural nets, and certainly not about language and cognition.

Consequently our sense of possibility has expanded enormously, while our knowledge and deep understanding has remained the same. And that is what has allowed the field to be captured by businessmen, executives and venture capitalists, who have little understanding or interest in the underlying conceptual issues. Scaling is something they understand. 

Hype the dramatically increased performance and collect the cash. Purchase and deploy more resources now, reap far greater profits in a decade. Everything else is noise and friction. 

But what if [the Dread] Gary Marcus and other critics are right. What if scaling LLMs is not adequate. What happens to all those investments then? 

2 comments:

  1. A stalemate in development? Well, it is easy to see that communities are fighting hard to keep data centers from being built in their neighborhoods. That's something tangible the oligarchs have to deal with.

    ReplyDelete
    Replies
    1. Given the amount of money that's being invested, if these companies don't turn a profit we could be facing a financial crisis comparable to the 2008 crisis.

      Delete