Wednesday, April 1, 2026

Claude: On the (late) discovery of the chess tree and things one can infer around and about that tree [MR-3c]

Late in his fourth chapter, Cowen introduces chess as an example of what contemporary AI can do. As you may know, chess has been a central interest of AI, so much so that John McCarthy, the mathematician who coined the term “artificial intelligence,” has written an article entitled, “Chess as the Drosophila of AI.” You may also know that chess has been a central interest of Cowen’s. He was a chess champion in his youth and he follows the game closely. 

Recognition of the chess tree comes late 

As it happens, chess presents us with one of those examples that Tyler finds so interesting in this book (The Marginal Revolution: Rise and Decline, and the Pending AI Revolution). It was during my freshman year at Johns Hopkins, I believe, that I read some Dover Publications book that was an omnibus presentation of information, cybernetics, and computing. I forget both the author and the exact title, but I remember two things. 1) In its presentation of computing, it talks about analog computing and digital computing. That was common until not long after personal computers arrived; after that, articles and books about “computers for dummies” stopped talking about analog computing and concentrated on digital (big mistake, IMO, but that’s a different story). 2) It talked about chess and made the point that, from an abstract point of view, chess was just like tic-tac-toe, an utterly trivial game. Both games are finite and the games can be arranged in a tree structure. But the chess tree is so large that not even the largest computer can list them all. I thought about it a little, got the point, and that has stuck with me ever since.

That was back in the mid-1960s. The largest computers at that time are tiny in comparison to the Brobdingnagian behemoths being used to train contemporary AIs, but they’re still too small for the chess tree.

The thing is, and here we’re getting to the Tylerian point, chess has been played by thoughtful people for centuries. Why was it only in the early 20th century that its tree-like structure was recognized? That’s so simple, and so basic.

So I put the question to Claude, who answered. And then we went on from there. I’ll present that whole dialog shortly, but I want to discuss something that I discovered while thinking about its answer.

Game theory (reading the historical record backwards)

First, I already knew more or less how it would answer, I was asking the question to get details. Ernst Zermelo explicated the tree structure in a paper published in 1913. The Wikipedia entry, “Zermelo’s theorem (game theory),” opens like this:

In game theory, Zermelo's theorem is a theorem about finite two-person games of perfect information in which the players move alternately and in which chance does not affect the decision making process. It says that if the game cannot end in a draw, then one of the two players must have a winning strategy (i.e. can force a win).

That clearly says that Zermelo’s theorem belongs to that branch of investigation known as game theory. But, thought I to myself, wasn’t game theory invented by John von Neumann in the middle of the 20th century? So I did an Ngram search on “game theory”:

The chart doesn’t lie. “Game theory” shows up in the middle of the century, not the beginning. So a did a bit more digging and found an article, “Zermelo and the Early History of Game Theory,” that begins with this sentence: “It is generally agreed that the first formal theorem in the theory of games was proved by E. Zermelo in an article on Chess appearing in German in 1913 (Zermelo (1913)).” So, Zermelo’s theorem was retrospectively assimilated into game theory once game theory had become a recognized field of inquiry. Culture works like that.

The same thing has happened to the diamonds-water paradox that Tyler presents early in the first chapter as THE paradigmatic example of marginalist reasoning (page 4 in the PDF). Tyler presents part of that story in the book. I dug out more of the story by querying the accompanying AI. I won’t go into the details, but it boils down to this:

  1. Galileo presents it in the 17th century, but in a context where it’s part of a debate between the Ptolemaic and Copernican cosmologies.
  2. Adam Smith picks it up in The Wealth of Nations (1776), where it’s framed as “a point about the inadequacy of use-value as an explanation of price” (AI’s wording).
  3. Jevons frames it as an example of marginalism (1871).
  4. Paul Samuleson uses it in his 1948 textbook, Economics: An Introductory Analysis, which enshrines it in economic doctrine as the prime example of marginalist thinking.

And that textbook went on to become one of the most important introductory texts in economics.

And so it goes. 

Note: At times Claude refers to my "Play book." That's the book I've been working on, and with ChatGPT as well: Play: How to Stay Human in the AI Revolution.

* * * * *

I now give you my dialog with Claude. Zermelo’s theorem is the start. From there:

  • Chess as a HUMAN game;
  • Jazz improvisation, constraint, freedom, and dialog:;
  • AlphaZero’s stylistic distinctiveness;
  • Machine intelligence and human intelligence;
  • Language as a domain for computation, and finally;
  • Chess as a specialized form of language.

I may return to those last three topics in a later post. 

* * * * * 

What are the prerequisites for Zermelo’s theorem?

I’ve got a question about chess, one that’s related to Chapter 3 of Tyler Cowen’s book, though he doesn’t discuss chess anywhere. But he is concerned about why economics seems to come so late in intellectual history. So, why, given how long chess has been played, was it only in the early 20th century that the tree-structure of the game was recognized? On the one hand, since the chess tree is so large, it would be impossible to list the whole thing and determine its finite nature that way. But, abstractly considered, on might imagine getting a hint in that direction by trying to list the games. White has 20 possible moves to start (even if most of them are never played), and Black has 20 possible replies to each of those. At this point the chess tree is 400 branches wide, and it rapidly gets much worse. On second thought, no, that won’t work.

You need a method of reasoning. And that didn’t exist until….Well Zermelo’s proof dates from 1913. Just when the relevant math existed, I don’t know, but I doubt it would have been much before then. I’m sensing intuitively that we’re in the same ballpark as Cantor’s diagonal method, which was late 19th century.

Your intuition about the timing is sharp, and the Cantor connection is real — but the deepest reason runs even further into the foundational transformation of mathematics than just the diagonal argument. Let me develop this.

What the proof actually requires

Zermelo's 1913 paper is titled, revealingly, "On an Application of Set Theory (Mengenlehre) to the Theory of Chess Games." The title tells you everything: this is explicitly an application of Cantor's set theory. Zermelo needed three things that didn't exist before the late 19th century:

First, the concept of a complete strategy as a mathematical object — not a heuristic or a rule of thumb but a function that specifies a move for every possible position that could arise. This is a first-class set-theoretic object: a mapping from the (enormous but finite) set of all positions to the set of available moves. You cannot even state Zermelo's theorem without this concept. Before Cantor, "function" meant something like a formula or rule that you computed — the idea of a function as an arbitrary set-theoretic object, definable over an enormous domain without being explicitly written down, was simply not available.

Second, backward induction as a proof method on a finite tree of unbounded but determinate depth. You need to show that the set of positions from which White can force a win is well-defined, even though it's a subset of an incomprehensibly large structure. The argument works by showing that this set is recursively characterizable from the terminal positions backward — but "recursively characterizable over a structure you cannot enumerate" requires treating the whole tree as a completed mathematical object, not an unfolding process.

Third, Zermelo's own axiomatization of set theory (1908) — which came five years before the chess paper. The Axiom of Choice, in particular, is needed in the general case (for games of possibly unbounded length before the 50-move rule imposes finitude), and Zermelo had just invented it.

The Cantor connection is precise

You're right that it's the same ballpark, but it's more than approximate temporal proximity. The specific intellectual move Cantor pioneered that made Zermelo possible was the treatment of infinite (or astronomically large) structures as completed wholes — actual infinities, not potential infinities. Before Cantor, the dominant view (even among mathematicians, following Aristotle) was that infinity was always a process, never an object. You could always add one more, but you couldn't speak of "the set of all natural numbers" as a single thing you could reason about.

The chess tree is finite, but it's so large (roughly 10¹²⁰ possible games) that the conceptual move required is structurally identical to Cantor's: you cannot approach it by enumeration or by any incremental process. You have to treat the entire tree as a completed object and reason about its global properties by abstract methods. Cantor's diagonal argument is the purest example of this — you prove something about an uncountable structure by an argument that never touches most of its elements. Zermelo's chess proof does the same thing: it proves that a particular position is a forced win for White (or a draw) without ever specifying which moves constitute the winning strategy.

The Rank 4 structure

The cognitive rank analysis maps cleanly onto this. Rank 3 chess thinking is exactly what the great 19th-century practitioners developed: Morphy's tactical superiority, Steinitz's positional principles, opening theory, endgame theory. These are all algorithmic in the relevant sense — they are rules, patterns, procedures for navigating the game. This is sophisticated, even brilliant, but it's thinking with chess moves and positions as primitives.

The Rank 4 move is to ask: what kind of object is a chess game? Answer: a finite tree with two alternating agents, terminal nodes with assigned outcomes, and the question of whether a particular class of strategies (those guaranteeing a given outcome regardless of opponent play) is non-empty. You've taken the activity of playing chess — the Rank 3 algorithms and heuristics — and made the activity itself an object of mathematical modeling. That's precisely your formulation: the operations of the lower level become the objects of the next.

And notice what this means for the specific threshold you identified. The reason even imagining that the game is finite and tree-structured doesn't help you until you have the right tools is that the question you need to ask — "is there a complete strategy that guarantees outcome X?" — cannot be formulated without the concept of a complete strategy as a set-theoretic object. Knowing the tree is there doesn't help if you have no language for the thing you're trying to prove exists.

Cowen's theme

This fits Cowen's chapter 3 argument exactly. His diagnosis for why economics was so slow is that economic ideas require peering around a corner — you can't reach them by brute force, and you don't know where to look until the relevant conceptual apparatus exists. The chess case is if anything cleaner, because here the brute force failure is quantifiably obvious: 10¹²⁰ is not a number you can work through. The delay isn't from lack of effort or lack of institutional support or Christian resistance — the game had been played intensely for a thousand years by people who were clearly thinking hard about it. The delay was purely conceptual: the question couldn't even be asked in a mathematically tractable form until Cantor, Zermelo, and the late-19th-century foundations revolution made the requisite objects available. That's as clean an example as one could want of Cowen's "factoring a very large number" metaphor — once you have the factors (set theory, the function concept, backward induction), the result is almost immediate.

The 1913 date isn't merely contemporaneous with the Cantor-era mathematics — Zermelo himself was one of the creators of that mathematics, applying his own toolkit almost immediately to a new domain.

Structure of chess tree, sets of completed games

I have a question about the structure of the chess tree. I can imagine trees such that there is more than one path to any given terminal node. But I can also imagine trees where there is only one path from the root to any given terminal. Which kind is the chess tree? 

[Note that in my second sentence I make a mistake. The object I describe isn't a tree. Claude will correct me in its reply, but diplomatically, without actually asserting that I made a mistake.]

The chess tree has exactly one path from the root to any given node — but this requires care about what a "node" means, and the answer reveals something important about the structure.

Each node in the chess tree is not a board position (the arrangement of pieces) but a game history — the specific sequence of moves that led to the current state. That's why the tree property holds: by definition, a tree is a structure where each node has exactly one parent, so there is exactly one path from the root. The node just is that path.

The complication is that the same board position can appear at many different nodes. The position after 1. e4 e5 2. Nf3 Nc6 and the position after 1. Nf3 Nc6 2. e4 e5 are different nodes in the tree but identical board positions. So the tree is not a tree of positions — it's a tree of play sequences. If you were to collapse identical positions into single nodes, you'd get a directed acyclic graph, not a tree — and with the possibility of repeated positions (which the rules of chess actually constrain via the threefold repetition rule), you'd get cycles, making it a general graph. 

["Directed acyclic graph" is the correction.] 

This distinction matters for Zermelo. His proof works on the tree of play sequences, where the single-path property holds cleanly and backward induction is well-defined. If you tried to run the argument on the graph of board positions, the cycles would create exactly the kind of infinite regress that the proof needs to avoid. The finiteness guarantee (combined with the 50-move rule or similar constraints) is what converts what could be a graph-with-cycles into a finite tree amenable to the backward induction argument.

So the short answer: one path to each node, but many nodes can share the same board position.

Given the size of the chess tree, there must be a vast number of completed games that have never been played by any pair of humans, vast regions that have never been explored.

Yes, and the numbers make "vast" feel like an understatement. The total number of possible chess games is estimated at around 10¹²⁰ (the Shannon number). The total number of games ever played by human beings across the entire history of chess — call it a billion serious games, which is probably generous — is around 10⁹. The ratio of explored to possible is so small it's essentially zero in any meaningful sense. The explored portion isn't even a rounding error.

A few implications worth sitting with:

The tree isn't uniformly unexplored. Human play has concentrated intensely on a tiny corridor — the openings theory that professional players memorize runs perhaps 20-30 moves deep along heavily analyzed lines, and endgame tablebases (computed by machine) have exhaustively solved all positions with 7 or fewer pieces. So the exploration is extremely uneven: some narrow paths are known with absolute certainty, and the vast surrounding darkness is essentially untouched.

This connects to something philosophically interesting about Zermelo's theorem. It guarantees that one of three outcomes — White wins with perfect play, Black wins with perfect play, or both can force a draw — is determined. The winning strategy exists as a mathematical object. But it almost certainly will never be known to any human mind, and may never be computed. The tree is too large even for exhaustive machine search at current or foreseeable computational capacities. So you have a proof of existence without any prospect of construction — a very Rank 4 situation, exactly the kind of result that would have been philosophically unacceptable to a Rank 3 mathematician committed to constructivism.

There's also something here for your Play book framework. The unexplored regions of the chess tree are a kind of standing reminder that even in a completely rule-governed, finite, deterministic domain, human knowledge is infinitesimally thin. The "space" defined by the rules vastly exceeds anything that minds — human or, so far, artificial — have actually traversed.

Chess as a HUMAN game

Surely what makes chess such an interesting game (& formal object) is the fact that it is played by two humans (at least until quite recently). The set of games completed by two humans represents a certain kind of human interaction that has been taken to completion by two humans. The structure of those games must surely reflect the personalities of the two players. Though I would also imagine that games between relatively unskilled players would not do so as well as games by very good players; the lack of skill gives less “room” to express one’s personality.

This is a rich observation and I think it opens in several directions worth distinguishing.

On the skill/personality relationship

Your intuition is right but the mechanism is worth making precise. An unskilled player frequently faces positions where there is effectively only one non-catastrophic move — the position has constrained them to a corridor so narrow that "choice" in any meaningful sense is absent. Their personality can't express itself because they aren't genuinely choosing; they're reacting to immediate threats, or missing them entirely. The game is being driven by the position, not by the player.

A highly skilled player, by contrast, regularly navigates positions where multiple continuations are all objectively reasonable — where the choice among them is genuinely a choice, reflecting something about temperament and values. Attack or consolidate? Trade queens and head for an endgame or keep the tension? Accept a structural weakness in exchange for piece activity? These are decisions that have no single correct answer, and different great players answer them differently and consistently across thousands of games. This is where personality enters.

In a sense, skill creates the conditions for personality to be expressed — not despite the constraints of strong play but because of them. The constraints eliminate the noise of blunders and reduce the game to something like genuine authorship. This is a general principle that extends well beyond chess: technical mastery is what transforms a medium from something that happens to you into something you shape.

On the recognizability of styles

The chess world has always recognized this. The great players of the 20th century had styles as distinctive as the styles of composers or writers. Tal played as if the rules of material compensation didn't apply to him, conjuring sacrifices that were sometimes unsound but psychologically devastating. Petrosian's games look almost passive until you realize he had prevented every conceivable counterplay fifteen moves in advance. Fischer was clinical, almost impersonal, eliminating complexity until the win was inevitable. Karpov squeezed. Kasparov pressurized. These aren't just impressionistic descriptions — they correspond to systematic biases in how each player distributed their choices across the branch points the game offered them.

What this means for your tree framing is that the set of games played by any particular master is a kind of signature within the tree — a probability distribution over continuations that is recognizably theirs. Two different grandmasters playing the same opening position will systematically explore different neighborhoods of the tree.

The two-player interaction point

This is perhaps the deepest part of your observation. A chess game between two strong players isn't just two independent signatures superimposed — it's a dialogue, where each player's choices are partly responses to the other's style. If you're playing Tal you may feel compelled to play more solidly than usual, accepting a different kind of game than you'd choose against Karpov. The game that results is a joint composition, an artifact of the specific interaction between those two personalities at that moment.

This means the game has a character that belongs to neither player individually — it emerges from their encounter. The chess tree as a mathematical object is indifferent to this; every node is just a node. But the human chess tree — the subset actually played — is structured by these dynamics of mutual recognition and response in a way the abstract tree entirely ignores.

This is another domain where the mathematical object and the human institution built around it come radically apart, which seems important for your Play book framework: the formal structure underdetermines the human reality almost completely.

Jazz improvisation, constraint, freedom, and dialog

Jazz improvisation is like that as well. The more technical skill you have, with your instrument, and with musical knowledge (more important), the more you are able to express yourself. Surely this is true for many human activities.

Yes, and I think it points toward something that deserves to be stated as a general principle rather than left as a family of analogies.

The paradox of constraint and freedom

The naive view is that constraints limit expression — more rules, more technique required, less room for the self. The reality seems to be the opposite: constraints are what create the space in which a self can be expressed, because they eliminate the noise that would otherwise drown the signal. The jazz musician who hasn't internalized the harmonic vocabulary of a standard is fighting the changes; the one who has internalized them so thoroughly that they're no longer conscious can play through them toward something personal. The technique has to become transparent — second nature, below the threshold of deliberate attention — before it can serve as a medium for expression rather than an obstacle to it.

This has a neurological correlate worth noting. The automatization of technique frees working memory and attention for the higher-level decisions — the phrasing, the risk-taking, the response to what the other musicians are doing. Exactly as with chess: the grandmaster who doesn't have to calculate whether a piece is hanging can spend their cognitive resources on the strategic personality-expressing choices.

The dialogic dimension again

Jazz makes the two-player chess point even more vivid, because the interaction is continuous rather than turn-by-turn, and the number of participants can be larger. A jazz ensemble is negotiating in real time — the bassist responds to the pianist, the pianist adjusts to what the soloist implies, the drummer creates a frame that all of them are simultaneously inhabiting and reshaping. The music that results belongs to none of them individually. This is what makes great live recordings irreproducible even by the same musicians the next night: the specific dialogue can't be reconstructed.

Miles Davis understood this theoretically as well as practically. His role as bandleader was partly to create conditions — harmonic openness, rhythmic ambiguity, a particular emotional atmosphere — within which his sidemen's personalities could collide productively. The constraint he imposed was loose enough to allow the interaction to generate something none of them would have produced alone.

The general principle

Across chess, jazz, and I'd argue most skilled human activities, what seems to be operating is this: a domain has to be sufficiently structured to make choices meaningful, and sufficiently open to make choices genuine. Too little structure and you have noise — the unskilled chess player, the musician who can't negotiate the changes. Too much structure and you have algorithm — a computer solving a chess endgame, a musician so bound by convention that every choice is determined. The expressive sweet spot is where the constraints are internalized to the point of transparency but the space they define remains genuinely open.

This maps interestingly onto your cognitive ranks framework. Rank 3 is where the algorithms are — where technique is explicitly procedural and conscious. What you're describing is what happens after Rank 3 mastery, when the algorithms have been so thoroughly absorbed that they drop below conscious attention and become the substrate for something that looks more like Rank 1 — intuitive, gestural, personal — but is operating at a much higher level of sophistication. The Rank 1 shaman and the jazz master both operate intuitively, but the intuitions are not the same kind of thing at all. The master's intuitions are, in a sense, compiled Rank 3. 

 [On Rank 3 and chess algorithms, recall comments about Morphy and Steinitz from earlier on in the Zermelo section.]

Which suggests that full human expression in a technical domain requires passing through explicit algorithmic mastery and out the other side into a new kind of fluency — one that carries the structure invisibly.

AlphaZero’s stylistic distinctiveness

Now, once computers started playing the game, the set of completed games will include games between humans and computers. Some of those games will be identical to those played between humans, that is to say, some paths to completed games have been taken by both humans and computers. But there will be other games that have never been played by a pair of humans. For that matter it is also likely that there will be completed games played by a pair of humans that have not been played by a human-computer pair, though over time the number of such games will diminish. What really interests me is the fact that once very capable computer programs began playing the game, new regions of the chess tree have been explored. Hasn’t it been remarked that the neural network programs developed by DeepMind seem to play a different style of game? That would reflect the fact that they’ve explored part of the tree that humans haven’t.

Yes, this has been extensively remarked upon, and it's one of the most striking developments in the history of the game.

AlphaZero's stylistic distinctiveness

When DeepMind's AlphaZero was revealed in 2017, the chess world's reaction wasn't just that it was strong — engines had been stronger than humans for years by then. What was remarkable was how it played. It showed a systematic preference for piece activity and long-term initiative over material, willingness to sacrifice pawns for dynamic compensation that would persist 20-30 moves later, and a way of handling certain pawn structures that simply had no precedent in either human or computer play. Grandmasters described its games with words like "alien" and "from another dimension."

The contrast with traditional engines like Stockfish is as revealing as the contrast with human play. Stockfish is essentially a very fast, very deep version of the kind of calculation humans do — it searches the tree exhaustively to great depth using hand-crafted evaluation functions built from human chess knowledge. It plays recognizably "computer chess": precise, clinical, materialistic, without strategic vision in the human sense because it doesn't need one. AlphaZero, trained by self-play from random without human games in its training data, discovered its own evaluation function — and that function weights things differently than either human intuition or human-encoded engine heuristics do.

Your tree framing captures this precisely. Stockfish and human players, despite their vast difference in tactical ability, are exploring neighboring regions of the tree — Stockfish goes much deeper along the paths humans have identified as promising, but it's operating within a shared map of what "promising" means. AlphaZero explored regions that neither humans nor traditional engines had reason to visit, because its evaluation function wasn't initialized from human knowledge. It discovered that certain positions humans had considered slightly inferior were actually rich with long-term compensation of a kind that's difficult to quantify but very real.

The epistemological implication

This has a remarkable implication for the structure of chess knowledge. It suggests that the human-explored corridor through the tree, even after centuries of intense play and decades of computer-assisted analysis, is not just quantitatively thin but directionally biased — skewed toward regions that human pattern recognition finds natural, avoiding regions that require holding long chains of non-obvious compensation in mind. The chess tree apparently contains whole strategic continents that human cognition had no reason to enter and no tools to navigate once there.

This is a genuine discovery about the game, not just about human limitations. The game contains those regions whether or not anyone explores them. AlphaZero's play is evidence that the game is richer than the entire human tradition had revealed.

The connection to your compiled-Rank-3 point

What's interesting in your framework is that AlphaZero's "style" didn't come from internalizing human technique and transcending it the way a human master does. It came from a completely different developmental path — one that never passed through explicit algorithmic human knowledge at all. Yet it arrived at something that looks, to human observers, like style in exactly the sense you identified: consistent preferences, a recognizable aesthetic, choices that reflect something like values about what kind of position is desirable.

This raises a genuinely unsettling question: is the stylistic distinctiveness of AlphaZero's play a reflection of something deep about the structure of the chess tree itself — certain topological features of the unexplored regions that any sufficiently capable explorer would find — or is it an artifact of the specific architecture and training process DeepMind used? In other words, would a differently designed neural network, also trained by self-play, discover the same stylistic territory, or different territory?

We don't fully know the answer, but it matters for how you interpret the exploration metaphor. If different architectures converge on the same regions, that suggests those regions have an objective significance in the tree's structure. If they diverge, then "exploration" is more strongly shaped by the nature of the explorer than the territory — which would be its own interesting finding.

Machine intelligence and human intelligence

And this has more general implications for the difference between human intelligence and machine intelligence.

Yes, and I think the chess case is unusually clean as an entry point into this question precisely because the domain is so well-defined. The rules are fixed, the objective is unambiguous, the tree is finite. If human and machine intelligence were going to converge anywhere, it should be here. The fact that they don't — that AlphaZero explores genuinely different territory even in this maximally constrained domain — is telling.

The initialization problem

Human chess mastery begins with human perception, human memory, human pattern recognition shaped by evolution for an environment that has nothing to do with chess. The grandmaster's intuitions are compiled Rank 3 as we discussed, but what they're compiled from is a biological substrate with its own prior structure — one that finds certain patterns salient, certain material relationships easy to hold in mind, certain kinds of compensation intuitively graspable and others not. Human chess knowledge is a product of human minds applied to chess, which means it inherits all the biases and affordances of those minds.

AlphaZero had none of that initialization. It started from random play and discovered chess structure entirely through the feedback signal of winning and losing. So what it found reflects the structure of the game filtered through the architecture of a neural network — a very different kind of prior. The resulting "knowledge," if we want to call it that, is incommensurable with human chess knowledge in a deep sense. Not better or worse along a single dimension, but differently shaped.

This generalizes immediately. Every domain of human knowledge — not just chess but science, art, music, language, ethics — has been explored by minds with a particular biological and cultural initialization. The map we have of any domain reflects both the territory and the nature of the explorers. Machine intelligence, initialized differently, will draw different maps. Some of what it finds will correct human errors and fill genuine gaps. But some of it will simply be different — reflecting a different set of cognitive affordances rather than a more accurate picture of the territory.

The embodiment point

This connects to something central to your doppelganger framework in the Play book. Human intelligence developed in, and is thoroughly shaped by, embodied experience — a body that gets hungry, tired, hurt, that moves through space, that has a developmental history from infancy, that is embedded in relationships of dependency and care. The categories through which humans understand the world — effort, rest, danger, comfort, near and far, before and after — are grounded in that embodiment in ways that are very difficult to fully articulate, precisely because they're so fundamental they're normally invisible.

Machine intelligence lacks this grounding entirely. It processes patterns in data, but the data was generated by embodied beings in a world the machine has never inhabited. This is a different kind of initialization problem than the chess case — it's not just that the starting point is different, it's that there may be whole dimensions of the territory that are simply not accessible to an explorer without a body. Not because the machine lacks compute, but because the relevant categories can't be constructed from pattern-matching over text and images alone, however vast.

The chess case is actually the easy version of this problem, because chess doesn't require embodiment — the territory is fully specified by the rules, and a disembodied explorer can in principle cover all of it. In domains that are partly constituted by embodied human experience — ethics, aesthetics, the meaning of a piece of music — the machine's different initialization may be a more fundamental limitation, not just a different angle of approach.

The asymmetry of comprehensibility

There's a further asymmetry worth noting. Human grandmasters can watch AlphaZero's games and, with effort, learn from them — they can expand their understanding of chess into the new territory AlphaZero has explored. The knowledge is transmissible across the gap, even if imperfectly. This is partly what made AlphaZero so valuable to the chess world: it was alien but not incomprehensible.

But the direction of comprehension is not symmetric. AlphaZero has no understanding of why its games look alien to humans, no model of the human chess tradition it departs from, no grasp of what a human finds beautiful or surprising about a sacrifice. The gap is navigable from the human side in a way it isn't from the machine side — and this asymmetry may be a very general feature of the relationship between human and machine intelligence, not specific to chess.

Which suggests that the most productive framing isn't human intelligence versus machine intelligence as competing ways of doing the same thing, but human and machine intelligence as differently initialized explorers of overlapping but non-identical territories — with humans uniquely positioned to integrate what the machines find into a framework of meaning that the machines themselves cannot construct. That seems close to the core of what your Play book is arguing.

Language as a domain for computation

And then we have language, which is utterly different from chess as a domain for computational exploration. It is unbounded, infinite for all practical purposes, and deeply embedded at its core, though there are many abstract concepts that are not directly defined over physical phenomena. Justice would be one example, though it surely is grounded in the evolutionary proclivities of human action and sentiment.  

The contrast with chess runs very deep, and I think it's worth unpacking exactly where the differences lie, because they accumulate into something qualitative rather than merely quantitative.

The domain structure

Chess has a finite, explicitly specified generative structure — the rules — from which the tree follows necessarily. You can hold the complete specification of the domain in a few pages. Language has no such foundation. The generative principles of language are themselves partly unknown, contested, and probably not fully formalizable. The "rules" are statistical regularities over an open-ended practice that is continuously evolving, varies across communities, and is partly constituted by its own history of use. There is no analog to the chess rulebook.

This means that LLMs exploring language are not exploring a pre-existing tree the way AlphaZero explored a pre-existing chess tree. They're exploring something more like a landscape that is partly brought into being by the exploration — because language use is itself constitutive of language. The territory and the act of traversing it are not fully separable.

The embodiment grounding

The core vocabulary of human language — the concepts that anchor everything else — is grounded in embodied experience in ways that linguists and cognitive scientists have been documenting for decades. Lakoff and Johnson's work on conceptual metaphor showed that abstract reasoning is systematically structured by physical experience: we understand argument in terms of combat, time in terms of spatial movement, understanding in terms of grasping. These aren't decorative metaphors — they're the cognitive substrate through which the abstractions are constructed and held.

This means a language model trained on text is in a peculiar epistemic position. It has processed the outputs of embodied minds in enormous quantity, but without the generative source. It has the shadows on the cave wall without the objects that cast them, to use an image that is itself — notice — a spatial metaphor for an abstract epistemological relationship. Whether that's sufficient to reconstruct something functionally equivalent to the embodied grounding, or whether it produces something systematically different in ways that matter, is genuinely uncertain. My honest assessment is that it produces something different — fluent over the surface, but with a different deep structure.

The status of justice

Your example of justice is well chosen because it sits at exactly the interesting intermediate point. It's not directly defined over physical phenomena the way "heavy" or "red" or "falling" are. But you're right that it's grounded in evolutionary proclivities — reciprocity, fairness intuitions, punishment of defectors, sensitivity to hierarchy and its legitimacy. These are detectable in human children before explicit moral reasoning develops, and variants appear across human cultures with remarkable consistency, and in attenuated forms in other social primates.

So justice is abstract in the sense that it can't be pointed to, but it's not arbitrary — it's constrained from below by something real in human social and emotional life. It occupies a position in a long chain of reflective abstraction that originates in those evolutionary proclivities, passes through Rank 2 codifications in law and religious commandment, Rank 3 philosophical systematization in Plato, Kant, Mill, and arrives at Rank 4 treatments in game theory, social choice theory, mechanism design.

What a language model has is the upper end of that chain — and substantial portions of the middle — but the bottom, the evolutionary and embodied grounding, is only present as described and discussed in text, not as lived. Whether that matters for reasoning about justice in practical contexts is an open question. For many purposes it probably doesn't — the upper end of the chain is where most of the action is. But there may be places where it does matter, where the right answer requires something that can only be known from the inside of the kind of creature that has justice intuitions rather than justice descriptions.

LLMs as explorers of linguistic territory

The chess analogy does carry over in one important respect. Just as AlphaZero explored regions of the chess tree that humans hadn't visited, LLMs have in some sense mapped regions of the linguistic space — combinations, transitions, framings — that human writers hadn't explicitly traversed. Whether this constitutes genuine discovery or sophisticated interpolation over a training distribution is one of the genuinely hard questions about these systems.

But here the disanalogy reasserts itself. In chess, we can verify whether an explored region is genuinely new and whether what's found there is objectively strong. In language, the criteria are much harder to specify. Is a formulation novel in a way that matters? Does a connection between concepts constitute genuine insight or a statistical artifact? There's no equivalent of the chess win-loss signal to ground the evaluation. The territory doesn't have an objective structure that exploration can reveal in the same clean sense.

The deepest difference

Chess is a domain humans invented and then explored. Language is a domain humans are — it's not a tool they picked up but a constitutive feature of what they are as a kind of being. Heidegger's claim that language speaks through us rather than we through it is an overstatement, but it points at something real: humans don't stand outside language and use it, they think, remember, plan, and relate to each other in it, from inside.

A language model is outside language in a way that no human is, processing it as a structured domain of patterns rather than inhabiting it as a medium of existence. This is the deepest version of the initialization difference — not just a different starting point but a fundamentally different relationship to the territory being explored. Which returns us to the embodiment point and to what seems to be the core of your doppelganger argument: that a genuinely human-adjacent AI would need to come to language the way humans do, from inside a developing life, not from outside a training corpus.

Chess as a specialized form of language

And if you think of chess as a very specialized kind of language, one whose primitive terms are defined over a board and pieces you place on the board, then chess is one of the products of human language, a Rank 2 product.

That's a precise and illuminating placement, and it retroactively sharpens several threads from our earlier discussion.

Chess as Rank 2 product

The defining feature of Rank 2 is metalingual definition — explicit rule specification that is stable, transmissible in writing, and independent of context. Chess is exactly this: a system whose primitive terms (the pieces, the board) are defined entirely by explicit rules of movement and capture, with no residue of ambiguity or contextual variation. You cannot learn chess the way you learn your native language — through immersion, inference from use, gradual internalization. You have to be told the rules explicitly, in propositional form. The game doesn't exist until those definitions are in place, and it is fully constituted by them. That's a Rank 2 object in the precise sense.

It also means chess required writing for its stable transmission. The rules are too complex and too precise to survive oral tradition without drift — the specific movement of the knight, the conditions for en passant and castling, the definition of checkmate. Writing fixes them, makes them portable across time and culture, and makes it possible to play the same game in 12th century Persia and 21st century New York.

The genealogical chain this reveals

This gives us a clean four-level stack:

Natural language itself is the Rank 1 medium — speech, metaphor, the original cognitive technology. Chess is a Rank 2 product built within and out of that medium — a formal language whose primitives are defined by metalingual rules. The rich human practice of chess play, including the stylistic traditions, opening theory, the culture of the game, is Rank 3 elaboration — algorithmic, systematic, teachable knowledge built over the Rank 2 foundation. And Zermelo's 1913 proof is the Rank 4 move — taking the entire Rank 2/3 structure as an object of mathematical modeling and asking what kind of thing it is.

The thousand-year gap between chess-as-game and chess-as-mathematical-object is now legible as the gap between Rank 2 and Rank 4, with Rank 3 filling the intervening centuries with increasingly sophisticated but non-reflective practice. Rank 4 couldn't happen until the conceptual apparatus existed to take the game as object — which required Cantor, set theory, and the foundational revolution, as we established. But it also required that the object being analyzed was itself sufficiently well-defined to be mathematized — and that's the Rank 2 condition. A folk game with fuzzy rules and regional variations couldn't have been Zermelo's subject matter.

The broader implication

This suggests a general pattern: Rank 4 analysis requires Rank 2 objects. The formalization that Rank 2 achieves — explicit definition, stable rule specification, written transmission — is what makes a domain available for the kind of modeling that Rank 4 performs. You can't take something as a mathematical object unless it's already been made sufficiently object-like by prior formalization.

This may be part of why economics was so late, by the way. Before there was a sufficiently precise Rank 2 specification of what supply and demand are — before the concepts were stabilized and defined with enough rigor — marginalism couldn't perform its Rank 4 move on them. Adam Smith is partly a Rank 2 achievement in this sense: fixing the vocabulary, stabilizing the concepts, making the domain available for subsequent formalization. Jevons then does to Smith what Zermelo does to chess.

And it raises an interesting question about natural language itself — the one domain that resists this treatment most stubbornly. Natural language has never been given a complete Rank 2 specification, and may be incapable of one, which is precisely why it remains so resistant to the kind of clean mathematical analysis that chess and formal economics permit. It's the medium in which Rank 2 objects are constructed, which may be why it can't itself become one. 

* * * * * 

Note: I'm not sure what I think of this last section, and some other things as well, but this last section in particular. Claude just made it up quite recently. I've only read it twice, once when I had the original conversation and again as I've prepared it for the blog. That's not enough for me to reach settled conclusions on what is a very rich discussion. I don't know enough about chess to have a serious opinion on some things and much about cognitive ranks is still very much under discussion. But I have no qualms putting this out in that spirit: Open for discussion.

No comments:

Post a Comment