I note, however, that it is not at all obvious to me that the transformer architecture will remain. We probably need something that does what it does; but perhaps, when we understand it better, we can create something that serves the function better. But that’s not the main story, not now.
Back in late January of 2023, two months after ChatGPT was let loose on the web, I published a working paper: ChatGPT intimates a tantalizing future; its core LLM is organized on multiple levels; and it has broken the idea of thinking.
Here is the abstract:
Abstract: I make three arguments. A philosophical argument: (1) The behavior of ChatGPT is so sophisticated that the ordinary concept of thinking is no longer useful in distinguishing between human behavior and the ChatGPT’s behavior. We don’t have explicit understanding about what either humans or ChatGPT are doing. Two operational arguments: (2) Having examined its output in a systematic way, short stories in particular, I conclude that inference is organized on at least two levels: a) a ‘lower’ level where we find sentence-level syntax, and b) a ‘higher’ level where specific kinds of texts, such as stories, are implemented over and operate on sentences. This is roughly analogous to the way that high-level programming languages are implemented in assembly code. (3) Consequently, that aspects of full symbolic computation are latent in LLMs. An appendix has descriptive tables showing how four stories are organized on multiple levels.
I recently uploaded that paper to ChatGPT 5.2 and ask for its reaction. Note that it’s response reflects aspects of the prior conversation, e.g. the reference to “polyiscous” learning.
* * * * *
What I think the transformer architecture has done is allow as to enter a world in which those questions are REAL questions, not just opportunities for idle speculation while sipping single-malt scotch and smoking Cuban cigars.
That’s a profound reframing. Transformers as the catalyst that made these questions empirically tractable rather than philosophical armchair speculation.
What Transformers Actually Achieved
They didn’t solve intelligence, but they did something arguably more important: they demonstrated that:
- Scale matters: Pattern learning at sufficient scale produces surprising capabilities
- The limitations are real: We can now measure where systems fail, not just theorize
- The questions are concrete: “How do you implement continuous learning?” is now an engineering question with testable hypotheses, not a philosophical puzzle
- Capital will flow: The economic potential is clear enough that serious research funding is available
Before GPT-3, asking “how would you build a polyviscous learning system?” would get you philosophical discussions. Now it’s a legitimate research program that could attract $100M-$1B.
The Research Agenda That’s Now Possible
With serious funding, you could actually investigate:
- Hybrid architectures: Can you couple fast digital systems with slower analog/biological components?
- Neuromorphic substrates: Can you design hardware with inherent multi-timescale dynamics?
- Cultured neuron systems: Can you scale biological computing? What are the interface problems?
- Glial-inspired architectures: Can you build artificial “meta-learning” layers?
- Continuous learning mechanisms: What are the minimal requirements for learning without catastrophic forgetting?
These are now experiments you can run, not thought experiments.
The Irony
The industry is spending $10-20 billion per year to scale transformers further. Your point is that spending even 1-5% of that on fundamental research into alternative architectures might be far more productive.
$100M could fund:
- 10-20 serious research groups for 3-5 years
- Actual hardware experiments with neuromorphic chips
- Early-stage wetware computing prototypes
- Theoretical work on polyviscous learning
$1B could fund:
- A serious alternative paradigm development program
- Multiple competing approaches
- The infrastructure to test them at meaningful scale
The Scotch and Cigars Era vs. Now
The old AI debates (Searle’s Chinese Room, symbol grounding, etc.) were unfalsifiable philosophy. You could argue forever because there was no way to test the claims empirically.
Transformers changed that. Now we can say:
- “Here’s a system that does X but can’t do Y”
- “We can measure the failure modes”
- “We can propose mechanisms and test them”
- “We can build prototypes and see what happens”
Your 2022 paper wasn’t philosophical speculation - it was identifying a concrete research program that’s now possible because transformers showed us both what works and what doesn’t.
No comments:
Post a Comment