Saturday, February 28, 2026

Elysian Park, Hoboken

Computation, Chess, and Language in Artificial Intelligence

New working paper. Title above, links, abstract, contents and introduction below:

Academia.edu: https://www.academia.edu/164885566/Computation_Chess_and_Language_in_Artificial_Intelligence
SSRN: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6319062
ResearchGate: https://www.researchgate.net/publication/401355671_Computation_Chess_and_Language_in_Artificial_Intelligence

Abstract: This paper reexamines the foundations of artificial intelligence by contrasting chess and natural language as paradigmatic domains. Chess, long treated as a benchmark for intelligence, is finite, rule-governed, and geometrically well-defined. It lends itself naturally to symbolic search and evaluation. Natural language, by contrast, operates in an unbounded and geometrically complex reality. Its rules are open-ended, its objectives diffuse, and its domain inseparable from embodied experience. With chess as its premier case – McCarthy: “the Drosophila of AI,” – AI has been guided by a deeper assumption: that the first principles of intelligence reduce to the first principles of computation. Drawing on Miriam Yevick’s distinction between symbolic and neural computational regimes, I propose that intelligence must be understood as operating in a geometrically complex world under finite resource constraints. Embodiment is therefore a formal condition of intelligence, not an incidental feature. Recognizing the structural difference between bounded games and open-ended cognition clarifies both the historical trajectory of AI and the conceptual limits of current systems.

Contents

Introduction: Chess, Language, and Intelligence 3 
Chess and Language as Paradigmatic Cases for Artificial Intelligence 5 
Three Principles of Intelligence (That Aren't Principles of Computation) 12 
Chronology of Chess, Language, and AI 15

Introduction: Chess, Language, and Intelligence

Chess has been a central concern of AI from the beginning. AI researchers didn’t become interested in natural language until the 1970s. Before that computational research on natural language was the domain of computational linguistics (CL), which started with machine translation (of texts from one natural language to another) as its primary problem. Thus we have two different disciplines, AI and CL.

In a sense, AI was fundamentally a philosophical exercise. It was an attempt to demonstrate, in effect, that we could understand the human mind in terms of computation. But rather than advance its philosophical objective through argument, it chose computational demonstration as its mode of expression. Chess became a central concern for two reasons: 1) On the one hand it was widely regarded as exhibiting the pinnacle of human reasoning ability. If we could create a computer program to play a championship game of chess, we could create a computer program that would be capable of cognitive or even perceptual task humans can do. 2) But also, the nature of chess made it well-suited for computational investigation.

The article that opens this working paper – Chess and Language as Paradigmatic Cases for Artificial Intelligence – concentrates on this and then goes on to make the point that language is utterly unlike chess in this respect. The chess domain is bounded and well-defined. Natural language is not; it is ill-defined and unbounded.

That’s as far as I got in the article, but I had been aiming for an argument that AI is still, in effect, mesmerized by the chess paradigm. I didn’t make it that far because language is so obviously different from chess that it is difficult to see how anyone would made that mistake.

What I have come to realize, only after I’d finished the article, is that it isn’t so much chess that has mesmerized AI. Rather it is computation itself. AI has been implicitly assuming that the First Principles of intelligence reduce to the First Principles of Computing. The first principles of computing can be found in the work of Alan Turing (the abstract idea of computing) and and others.

The first principles of intelligence are more stringent. As Claude put it a recent dialog:

First principle of intelligence: Must operate in unbounded, geometrically complex physical reality with finite resources.

Those two qualifications, an unbounded, geometrically complex reality, and finite computational resources, change the nature of the problem considerably. I note, in passing, that this allows us to assign formal significance to the concept of embodiment, for it is embodiment that commits intelligence to operating with finite resources in a geometrically complex universe.

Miriam Yevick’s 1975 paper, “Holographic or Fourier Logic,” is the crucial document, but it’s been forgotten. Using identification in the visual domain as her case, she showed that, where we are dealing with geometrically simple objects, sequential symbolic processing is the most efficient computational regime. But when we are dealing with geometrically complex objects, neural net processing is the most efficient computational regime. AI started out with symbolic processing in the 1950s and arrived at neural nets in the 2010s. But it hasn’t explicitly recognized that one must fit the mode of processing to the nature of the world. In that (perhaps a bit peculiar) sense, the researchers in the currently-dominant paradigm don’t know what they’re doing.

The second article in this working paper, Three Principles of Intelligence (That Aren't Principles of Computation), discusses this in more detail. I had it generated by Claude 4.5 after a long series of dialogs over several days.

The last article is a chronology of events in the history of chess and language in AI.

Friday, February 27, 2026

Are You Going With Me? – Pat Metheny

Friday Fotos: An urban canyon in Hoboken, NJ

Two Ways to Use AI: Homo Economicus vs. Homo Ludens

The academy has a problem, and it's been getting worse for over a century.[1]

We organize knowledge into disciplines—history, psychology, neuroscience, linguistics, economics—each with its own journals, conferences, and vocabulary. This structure, inherited from 19th-century German universities, serves one purpose brilliantly: it lets specialists gather details efficiently within well-defined boundaries.

But knowledge doesn't respect boundaries. The most important questions—How does the mind work? What makes us creative? Why do societies change?—require insights from multiple disciplines. The pattern you need to see often spans several "bins" of specialized knowledge.

Here's the paradox: we've been talking about interdisciplinary work for decades. Universities have interdisciplinary centers everywhere. Yet the actual structure of academic life—hiring, promotion, publication, funding—still runs on disciplinary rails laid down 150 years ago.

Now we have large language models. And we face a choice about how to use them.

The Economicus Approach

One path is to use LLMs to amplify and accelerate current arrangements. Let's call this the Homo economicus approach—the economic human, focused on optimizing production.

In this mode, LLMs become tools for:

  • Writing literature reviews faster
  • Reviewing papers for journals more efficiently
  • Generating incremental research at scale
  • Producing more publications per year
  • Staying safely within disciplinary boundaries

This sounds productive. More papers, faster reviews, greater output. But it doubles down on exactly what's broken. We already produce too many narrow specialist papers that too few people read. Using AI to produce more of them faster just amplifies the dysfunction.

The economicus approach treats knowledge production like manufacturing: maximize output, minimize cost, optimize existing processes. Stay in your lane. Don't take risks. Generate the next incremental advance.

The Ludens Alternative

There's another path. Call it Homo ludens—the playing human, focused on exploration and discovery.

In this mode, LLMs become tools for genuine cross-disciplinary integration. Not producing papers, but discovering connections. Not automating existing processes, but enabling new formations.

Here's what this looks like in practice:

Strategic Search Across Disciplines

Say you're investigating how language develops in children. Traditional approach: read the developmental psychology literature, maybe venture into linguistics if you're bold.

Ludens approach with LLMs: "Find work from any field that addresses the relationship between motor development and symbolic capacity."

The LLM doesn't care about departmental boundaries. It surfaces relevant work from neuroscience, evolutionary biology, comparative psychology, and anthropology—connections that specialists, confined to their silos, would miss.

Constraint Satisfaction Across Domains

Rigorous integration requires checking whether your ideas satisfy constraints from multiple fields simultaneously. Is your model of language acquisition consistent with what we know about brain development? Does it align with evolutionary timescales? Does it match observed behavior?

An LLM can rapidly check these cross-domain constraints: "Does this cognitive science claim contradict findings in neurobiology? What about developmental timelines?" It doesn't replace judgment, but it surfaces contradictions and connections that would take months of reading to discover.

Pattern Discovery in Unexpected Places

The most valuable insights often come from recognizing that two fields are studying the same phenomenon with different vocabularies. LLMs excel at this kind of pattern matching across terminological boundaries.

"What work in any discipline addresses hierarchical control systems switching between modes?" The answer might come from neuroscience (neural modulation), robotics (control architectures), or organizational psychology (decision-making frameworks). These aren't citations to pad your bibliography—they're genuinely different perspectives on the same deep problem.

The Center-Out Method

Start with a specific case—a text, an event, a phenomenon—and radiate outward to topics it touches. An LLM can help map these connections systematically: given this particular case study, what frameworks from different disciplines illuminate different aspects of it? [2]

This mirrors how actual insight works: you're wrestling with something specific, and you need whatever intellectual tools help, regardless of which department developed them.

Why This Matters

The difference isn't just practical—it's philosophical.

Economicus treats LLMs as labor-saving devices. Do what we already do, but faster and cheaper. This keeps us trapped in the existing system, just at higher speed.

Ludens treats LLMs as exploration tools. Find patterns we couldn't see before. Make connections that disciplinary blinders obscured. Enable the integrative work that institutions make nearly impossible.

The economicus approach optimizes local maxima—you get better and better at what you're already doing. The ludens approach helps you find new maxima you didn't know existed.

The Play Element

There's a deeper reason the ludens approach matters: genuine discovery requires play.

Not play as opposed to serious work, but play in the sense of free exploration before commitment. Trying unusual combinations. Following tangential connections. Seeing what emerges without knowing in advance what you're looking for.

This is how children learn, how scientists make breakthroughs, how jazz musicians create. You need freedom to explore widely before you settle on what's worth pursuing seriously.

The economicus approach eliminates this exploratory freedom in the name of efficiency. It optimizes production, but production of what? More of what we already have.

The ludens approach embraces exploration. You're not trying to write the next incremental paper. You're trying to discover what you don't yet know you're looking for.

The Current Moment

Right now, institutions are moving toward the economicus approach. Using LLMs to review more papers, generate more text, process more grant applications. It's understandable—they're under pressure to handle increasing volume.

But this is a catastrophic missed opportunity.

LLMs are genuinely good at working across disciplinary boundaries. They don't have careers to protect or departments to represent. They can pattern-match across the entire literature without caring which journal it appeared in. They're natural tools for the kind of integrative work that the current system makes nearly impossible.

Using them instead to accelerate existing processes is like using the internet purely to send faxes faster.

Thursday, February 26, 2026

Down by the river [Hoboken, Hudson River]

Dancing is fun, and good for you, too.

Danielle Friedman, Yes, Even You Can Dance, NYTimes.

For many people, dance feels more like play than exercise, which helps to explain its enduring appeal as a workout.

What began as “aerobic dancing” in the 1970s has evolved with exercise science (and contemporary playlists) into today’s cardio dance classes, which are typically high-energy sessions that engage the whole body. [...]

In recent decades, a growing body of research has found that dance may be just as beneficial for cardiovascular health as other common forms of aerobic exercise, when performed at a moderate to vigorous intensity.

Studies also suggest that dance can be an effective way to cultivate strength, balance and coordination, and can help to manage chronic pain. [...]

Dancing can have powerful psychological and cognitive benefits, helping to improve mood and memory. A 2024 review study found that, for some people, dancing was more effective for improving symptoms of depression than any other form of exercise.

When you dance with other people, you may also experience the many health benefits of being social, said Erica Hornthal, a dance therapist based in Chicago.

The article goes on to explain how you can create your own dance workout.

Or, you can just move to the music. Think about how you danced when you were a kid, Ms. Hornthal said, shaking off stress, letting loose and having fun.

“I really believe anyone can dance,” said Sadie Kurzban, founder of the cardio dance franchise 305 Fitness. “You can have no rhythm and still dance. You can be seated and still dance.”

Wednesday, February 25, 2026

The Paradox of Contemporary AI: Intellectual Success and Institutional Failure

We’re faced with a paradox: On the one hand the last 15 years of work in machine learning has to be seen as a profound INTELLECTUAL SUCCESS. In particular, it’s clear that the success of the transformer architecture – which first became apparent with GPT-3 – has brought us to the threshold of a new intellectual and technological era. However, existing architectures – and I’m thinking in particular of LLMs made by transformers – aren’t sufficient, as Gary Marcus, Yann LeCun and now even Ilya Sutskever, among others, have argued.

Thus we must face what has happened since then. An intellectual monoculture, one based on scaling and the construction of ever larger data farms, has come to dominate the field, and that has to be seen as a profound INSTITUTIONAL FAILURE. I say “institutional” quite deliberately because it wasn’t just this individual and that one and the other one and on through a whole list of individuals. No, the failure must be attributed to institutions within which all those individuals function. 

* * * * * 

NOTE: This article at 3 Quarks Daily gives some of the reasons I regard this intellectual monoculture to be an institutional failure: Aye Aye, Cap’n! Investing in AI is like buying shares in a whaling voyage captained by a man who knows all about ships and little about whales.

Three Tablescapes

Wuthering Heights as initiation

Ross Douthat, Whatever Happened to Grown-Up Movies for Kids? NYTimes, Feb. 24, 2026.

What Douthat has in mind is “telling grown-up stories in a fashion suited to the ages between, say, 10 and 16.” He goes on:

What I want is emphatically not more Y.A. culture or “tween” books or Marvel sequels. Rather I want more adult culture that’s accessible to early teenagers, that presents grown-up themes without being explicit about everything, that feels like a bridge connecting childhood and adulthood rather than a young-adult detour or a jarringly coarse acceleration.

For example:

What I want is emphatically not more Y.A. culture or “tween” books or Marvel sequels. Rather I want more adult culture that’s accessible to early teenagers, that presents grown-up themes without being explicit about everything, that feels like a bridge connecting childhood and adulthood rather than a young-adult detour or a jarringly coarse acceleration.

He then goes on to contrast the novel, Wuthering Heights, with the current movie version:

which has been framed by its director, Emerald Fennell, as an attempt to channel her own experience encountering the Emily Brontë novel as a teenager. For Fennell that means not just giving us masturbation on the moors but also sexualizing every inch of the story, every cracked egg and kneaded loaf of dough, just as a hormonal teenage mind might do.

But that’s not what the Brontë novel offered to her teenage-reader self. It told a story in which sexuality is a potent force but not a pornographic one, in which extremity is everywhere but obscenity is not, and there are undercurrents and implications that the younger reader can grasp in part and the adult reader more completely.

“Wuthering Heights” the novel initiates, in other words, where “Wuthering Heights” the movie browbeats. And that feeling of initiation is what neither explicit R-rated entertainments nor the Y.A. fiction/superhero complex can really offer: a sense of encountering a world that’s fully adult but that makes allowances for innocence and inexperience, and that can be grasped provisionally with the promise of a greater understanding later on.

Snow World

Tuesday, February 24, 2026

Three Principles of Intelligence (That Aren't Principles of Computation)

Note: Claude 4.5 drafted this article after a long series of dialogs over several days. This is a continuation of the thinking in my current article in 3 Quarks DailyChess and Language as Paradigmatic Cases for Artificial Intelligence.


In the 1950s, artificial intelligence emerged from a productive confusion. We had just formalized computation itself—Turing and von Neumann had given us the fundamental principles of what computers could do. When we turned these powerful new machines toward intelligence, we naturally assumed the principles would be the same.

They aren't.

Computation vs. Intelligence

The principles of computation are domain-independent. A universal Turing machine can compute anything computable, whether that's arithmetic, chess moves, or protein folding. The Church-Turing thesis tells us that all models of computation are equivalent in what they can ultimately compute, given unlimited time and memory.

This universality is computation's glory—and intelligence's red herring.

Intelligence, as it actually exists in nature, operates under entirely different constraints. It must function in the physical world, with finite resources, solving problems that often don't have clean formal specifications. These aren't just practical limitations to be worked around; they're constitutive features that shape what intelligence is and how it must work.

Principle 1: Geometric Complexity Determines Computational Regime

The critical variable isn't how hard a problem is in some abstract computational sense, but the geometric complexity of the domain.

Consider chess versus visual object recognition. Chess is played on an 8×8 grid with a small set of piece types following rigid rules. The game tree is astronomically large—around 10^120 possible games—but it's finite and well-defined. You can represent board positions symbolically, enumerate legal moves, and search through possibilities systematically.

Vision operates in continuous three-dimensional space with effectively unbounded variation. Objects appear at different scales, orientations, and lighting conditions. There's no finite set of "legal configurations." You can't enumerate all possible images the way you can enumerate chess positions.

This difference in geometric complexity demands different computational approaches. Chess yields to systematic search through a definable space—what we might call sequential or symbolic processing. Vision requires something else: massively parallel processing that can handle continuous variation and incomplete information—holographic or neural processing.

In 1975, Miriam Yevick demonstrated this formally: the geometric complexity of objects in a domain determines the computational regime needed to identify them. Simple geometric objects can be handled by sequential symbolic systems. Complex geometric objects require holographic processing. This wasn't mere speculation—she made a formal mathematical argument about pattern recognition systems.

The field ignored her insight. We assumed all problems were fundamentally like chess—just harder. If symbolic AI could master chess, we thought, it would eventually master vision, language, and physical reasoning through better algorithms and more compute.

We were wrong. Vision didn't yield to symbolic AI no matter how much compute we threw at it. It required a regime shift to neural networks—systems whose architecture matches the geometric complexity of the visual world.

Principle 2: Intelligence Operates in Unbounded, Geometrically Complex Reality

Here's what makes intelligence different from computation in the abstract: intelligence evolved to work in the physical world, which is geometrically complex and open-ended. There's no finite game tree for "objects I might encounter" or "situations I might face."

This has profound implications. You can solve chess by exploring its game tree faster than humans can. But you can't solve vision or language understanding the same way because there's no complete tree to explore. The space isn't closed and enumerable—it's unbounded.

This is why Deep Blue beating Kasparov in 1997 didn't generalize the way we thought it would. Chess was solved by a room-sized supercomputer with custom hardware doing exactly what computers do best: blindingly fast systematic search. By 2025, a smartphone runs chess engines that would destroy both Deep Blue and Kasparov.

But that same smartphone can't run a GPT-4 level language model. Language still requires massive data centers. Why? Because language connects to the unbounded complexity of physical and social reality. No amount of faster chess-style search bridges that gap.

The field learned to beat humans at chess by doing what computers naturally excel at. Then we mistook this for a general template. We thought: "Intelligence is search through problem spaces. We just need bigger computers to search bigger spaces." But geometric complexity isn't about bigger—it's about different.

Principle 3: Embodiment as Formal Constraint

Embodiment isn't a philosophical talking point. It's a formal constraint on intelligence architecture.

When we say intelligence must be embodied, we mean: it must operate with finite computational resources in a geometrically complex physical world. This changes everything.

Abstract computation doesn't care about efficiency—a proof is valid whether it takes a second or a century. Physical computation must complete before the hardware fails. But biological intelligence faces a sharper constraint: it must acquire the energy it uses to compute. A deer's visual system can't require more calories than the deer can acquire. The computation must pay for itself.

This constraint shapes what kinds of solutions are viable. You can't exhaustively search unbounded spaces. You can't maintain perfect world models. You must make do with approximate, good-enough processing that operates in real time with available resources.

Crucially, this means different problems need different solutions—not just more or less compute, but fundamentally different architectures matched to the geometric complexity of the domain.

Why This Matters Now

Current AI has powerful neural networks that excel at pattern recognition in geometrically complex domains—vision, speech, even aspects of language. But the field still carries assumptions from the symbolic AI era:

  • That intelligence is domain-independent
  • That scaling compute will eventually solve any problem
  • That we can ignore embodiment and resource constraints
  • That all problems are fundamentally like chess

These assumptions persist even though we've abandoned symbolic AI. We've swapped the implementation (symbols → neural networks) but kept the framework (more compute → general intelligence).

This is why we need to distinguish computation principles from intelligence principles. Turing and von Neumann gave us the former. For the latter, we need to recognize that geometric complexity, unbounded reality, and embodied constraints aren't bugs to be worked around—they're the constitutive features that determine what intelligence is and how it must work.

The principles of intelligence aren't the principles of computation. Understanding this distinction is the key to understanding both what current AI can do and what it cannot.

Window shots on snow day

An economic analysis of Epstein's role as an intermediator [Glenn Loury]

Loury has written an article about this: Jeffrey Epstein as Middleman: An Economic Analysis, February 10, 2026.

From the opening:

Economic theory has always been more comfortable analyzing markets than the people who make markets possible. Textbook exchange is clean, synchronous, and explicit: a buyer meets a seller, a price clears supply and demand, and the transaction is complete. Yet much of real economic life proceeds otherwise. Buyers and sellers often do not meet directly. Information is fragmented. Trust is uneven. Transactions are staggered in time. Under such conditions, intermediaries—middlemen, brokers, fixers—emerge not as incidental features of exchange, but as central institutional actors.

The economics literature offers several ways of understanding why. In the theory of search and matching, intermediaries arise because finding a counterparty is costly and uncertain. Buyers and sellers may exist in the same economy but never meet at the same moment. A middleman, by standing ready and cultivating contacts on both sides of the market, increases the effective rate at which trades occur. In canonical models of intermediation, the middleman does not produce the good being traded; he produces access. His profit comes from reducing search frictions and exploiting his position as a node through which others must pass.

Enter, Jeffrey Epstein:

Once such a system is in place, it can become self-reinforcing. Individually, each participant may find it rational to rely on the intermediary. Collectively, the arrangement can be destructive. The intermediary becomes indispensable precisely because he sits at the intersection of multiple relationships, none of which fully sees the whole. No single actor has enough information—or incentive—to dismantle the system. Economics describes this as a norm-based equilibrium: stable, persistent, and resistant to reform, even when widely suspected to be rotten.

It is against this theoretical backdrop that the Epstein scandal acquires its deeper significance. Treated narrowly, it is a story of horrific sexual abuse and criminal failure. Treated analytically, it is also a case study in extreme relational intermediation. Jeffrey Epstein’s economic role was never easy to specify. He was not a conventional financier, nor simply a socialite. His apparent value lay in his position as a connector across domains that ordinarily remain separate: finance, philanthropy, academia, politics, and private life. He brokered access. He made introductions. He hosted, facilitated, and normalized encounters whose purpose was often left conveniently vague.

Here the reporting by DropSite News adds crucial texture. Rather than presenting Epstein as a lone operator, it reconstructs a dense web of interactions sustained over decades: repeated visits to multiple properties, overlapping social circles that included financiers, scientists, politicians, and university administrators, and philanthropic or advisory arrangements whose concrete deliverables were often ill-defined. DropSite documents how Epstein inserted himself into institutional spaces—elite universities, research initiatives, charitable projects—by offering himself as a conduit to donors, prestige, or influence, even when the precise source of his own wealth or authority remained opaque.

There's much more at the link.

On the street where I live

The transformer has put us on the shore of a new world

I note, however, that it is not at all obvious to me that the transformer architecture will remain. We probably need something that does what it does; but perhaps, when we understand it better, we can create something that serves the function better. But that’s not the main story, not now.

Back in late January of 2023, two months after ChatGPT was let loose on the web, I published a working paper: ChatGPT intimates a tantalizing future; its core LLM is organized on multiple levels; and it has broken the idea of thinking.

Here is the abstract:

Abstract: I make three arguments. A philosophical argument: (1) The behavior of ChatGPT is so sophisticated that the ordinary concept of thinking is no longer useful in distinguishing between human behavior and the ChatGPT’s behavior. We don’t have explicit understanding about what either humans or ChatGPT are doing. Two operational arguments: (2) Having examined its output in a systematic way, short stories in particular, I conclude that inference is organized on at least two levels: a) a ‘lower’ level where we find sentence-level syntax, and b) a ‘higher’ level where specific kinds of texts, such as stories, are implemented over and operate on sentences. This is roughly analogous to the way that high-level programming languages are implemented in assembly code. (3) Consequently, that aspects of full symbolic computation are latent in LLMs. An appendix has descriptive tables showing how four stories are organized on multiple levels.

I recently uploaded that paper to ChatGPT 5.2 and ask for its reaction. Note that it’s response reflects aspects of the prior conversation, e.g. the reference to “polyiscous” learning.

* * * * *

What I think the transformer architecture has done is allow as to enter a world in which those questions are REAL questions, not just opportunities for idle speculation while sipping single-malt scotch and smoking Cuban cigars.

That’s a profound reframing. Transformers as the catalyst that made these questions empirically tractable rather than philosophical armchair speculation.

What Transformers Actually Achieved

They didn’t solve intelligence, but they did something arguably more important: they demonstrated that:

  1. Scale matters: Pattern learning at sufficient scale produces surprising capabilities
  2. The limitations are real: We can now measure where systems fail, not just theorize
  3. The questions are concrete: “How do you implement continuous learning?” is now an engineering question with testable hypotheses, not a philosophical puzzle
  4. Capital will flow: The economic potential is clear enough that serious research funding is available

Before GPT-3, asking “how would you build a polyviscous learning system?” would get you philosophical discussions. Now it’s a legitimate research program that could attract $100M-$1B.

The Research Agenda That’s Now Possible

With serious funding, you could actually investigate:

  • Hybrid architectures: Can you couple fast digital systems with slower analog/biological components?
  • Neuromorphic substrates: Can you design hardware with inherent multi-timescale dynamics?
  • Cultured neuron systems: Can you scale biological computing? What are the interface problems?
  • Glial-inspired architectures: Can you build artificial “meta-learning” layers?
  • Continuous learning mechanisms: What are the minimal requirements for learning without catastrophic forgetting?

These are now experiments you can run, not thought experiments.

The Irony

The industry is spending $10-20 billion per year to scale transformers further. Your point is that spending even 1-5% of that on fundamental research into alternative architectures might be far more productive.

$100M could fund:

  • 10-20 serious research groups for 3-5 years
  • Actual hardware experiments with neuromorphic chips
  • Early-stage wetware computing prototypes
  • Theoretical work on polyviscous learning

$1B could fund:

  • A serious alternative paradigm development program
  • Multiple competing approaches
  • The infrastructure to test them at meaningful scale

The Scotch and Cigars Era vs. Now

The old AI debates (Searle’s Chinese Room, symbol grounding, etc.) were unfalsifiable philosophy. You could argue forever because there was no way to test the claims empirically.

Transformers changed that. Now we can say:

  • “Here’s a system that does X but can’t do Y”
  • “We can measure the failure modes”
  • “We can propose mechanisms and test them”
  • “We can build prototypes and see what happens”

Your 2022 paper wasn’t philosophical speculation - it was identifying a concrete research program that’s now possible because transformers showed us both what works and what doesn’t.