Alexander Doria has a recent post, The Model is the Product, that’s gotten me to think about future developments in a different way:
There were a lot of speculation over the past years about what the next cycle of AI development could be. Agents? Reasoners? Actual multimodality?
I think it’s time to call it: the model is the product.
All current factors in research and market development push in this direction.
- Generalist scaling is stalling. This was the whole message behind the release of GPT-4.5: capacities are growing linearly while compute costs are on a geometric curve. Even with all the efficiency gains in training and infrastructure of the past two years, OpenAI can’t deploy this giant model with a remotely affordable pricing.
- Opinionated training is working much better than expected. The combination of reinforcement learning and reasoning means that models are suddenly learning tasks. It’s not machine learning, it’s not base model either, it’s a secret third thing. It’s even tiny models getting suddenly scary good at math. It’s coding model no longer just generating code but managing an entire code base by themselves. It’s Claude playing Pokemon with very poor contextual information and no dedicated training.
- Inference cost are in free fall. The recent optimizations from DeepSeek means that all the available GPUs could cover a demand of 10k tokens per day from a frontier model for… the entire earth population. There is nowhere this level of demand. The economics of selling tokens does not work anymore for model providers: they have to move higher up in the value chain.
This is also an uncomfortable direction. All investors have been betting on the application layer. In the next stage of AI evolution, the application layer is likely to be the first to be automated and disrupted.
He then goes on to explain why he’s saying that. He then says:
...the current training ecosystem is very tiny. You can count all theses companies on your hands: Prime Intellect, Moondream, Arcee, Nous, Pleias, Jina, the HuggingFace pretraining team (actually tiny)… Along with a few more academic actors (Allen AI, Eleuther…) they build and support most of the current open infrastructure for training. In Europe, I know that at least 7-8 LLM projects will integrate the Common Corpus and some of the pretraining tools we developed at Pleias — and the rest will be fineweb, and likely post-training instruction sets from Nous or Arcee.
There is something deeply wrong in the current funding environment. Even OpenAI senses it now. Lately, there was some felt irritation at the lack of “vertical RL” in the current Silicon Valley startup landscape. I believe the message comes straight from Sam Altman and will likely result in some adjustment in the next YC batch but pinpoint to a larger shift: soon the big labs select partners won’t be API customers but associated contractors involved in the earlier training stage.
If the model is the product, you cannot necessarily build it alone. Search and code are easy low hanging fruits: major use cases for two years, the market is nearly mature and you can ship a new cursor in a few months. Now many of the most lucrative AI uses cases in the future are not at this advanced stage of development — typically, think about all these rule based system that still rule most of the world economy… Small dedicated teams with a cross-expertise and a high level of focus may be best positioned to tackle this—eventually becoming potential acquihire once the initial ground work is done. We could see the same pipeline in the UI side. Some preferred partner, getting exclusive API access to close specialized models, provided they get on the road for business acquisition.
Note I have been arguing that LLMs are a digital wilderness. In post from December 29, 2022, not long after ChatGPT was released, I quoted from an article by Ted Underwood:
In his penultimate paragraph Underwood notes:
I have suggested that approaching neural models as models of culture rather than intelligence or individual language use gives us even more reason to worry. But it also gives us more reason to hope. It is not entirely clear what we plan to gain by modeling intelligence, since we already have more than seven billion intelligences on the planet. By contrast, it’s easy to see how exploring spaces of possibility implied by the human past could support a more reflective and more adventurous approach to our future. I can imagine a world where generative models of culture are used grotesquely or locked down as IP for Netflix. But I can also imagine a world where fan communities use them to remix plot tropes and gender norms, making “mass culture” a more self-conscious, various, and participatory phenomenon than the twentieth century usually allowed it to become.
These digital wildness regions thus represent opportunities for discovery and elaboration. Alignment is simply one aspect of that process.
And by alignment I mean more than aligning the AI’s values with human values; I mean aligning its conceptual structure as well. That’s where “old school” symbolic computing enters the picture, especially language. Language – not the mere word forms available in digital corpora, but word forms plus semantics and syntactic affordances – is one of the chief ‘tools’ through which young humans are acculturated and through which human communities maintain their beliefs and practices. The full powers of language, as treated by classical symbolic systems, will be essential for “domesticating” the digital wilderness and developing it for human use.
What do you do with a wildness? You explore it, map it, enclose parts of it, and then develop them. That’s a job for “small dedicated teams with a cross-expertise and a high level of focus.”
I returned to the wilderness them in the report I recently posted on the year and-a-half I spent investigating ChatGPT: ChatGPT: Exploring the Digital Wilderness, Findings and Prospects. There I said:
To return to the metaphor with which I began this report, these LLMs, these so-called Foundation Models, they are a digital wildness. Wild and untamed, but rich in resources. Ongoing esearch in mechantistic interpretatility is one way to explore that wildnerness, to map it. I have been presenting a different, a complementary mode of exploration and mapping. We need to map the territory, settle it, and domesticate it. By that I mean develop symbolic models to operate on and in the territory. While we may start developing those models through hand-coding, we will have to develop programmatic techniques if we want to cover the territory, which, of course, will be ever expanding.
As far as I can tell, Doria is imagining that his small focused teams will be working within existing architectural and programmatic frameworks or obvious extensions of them. I’m imagining something a bit different, something that requires us to understand the internal operations of LLMs. Yet, I have little trouble imagining that here and there one or more of his small focused teams will begin making sense of those internal operations beyond that displayed in current interpretability research. Perhaps they will lay the foundations for the kind of programmatic techniques I am imagining.
H/t Tyler Cowen.
No comments:
Post a Comment