Late in his fourth chapter, Cowen introduces chess as an example of what contemporary AI can do. As you may know, chess has been a central interest of AI, so much so that John McCarthy, the mathematician who coined the term “artificial intelligence,” has written an article entitled, “Chess as the Drosophila of AI.” You may also know that chess has been a central interest of Cowen’s. He was a chess champion in his youth and he follows the game closely.
Recognition of the chess tree comes late
As it happens, chess presents us with one of those examples that Tyler finds so interesting in this book (The Marginal Revolution: Rise and Decline, and the Pending AI Revolution). It was during my freshman year at Johns Hopkins, I believe, that I read some Dover Publications book that was an omnibus presentation of information, cybernetics, and computing. I forget both the author and the exact title, but I remember two things. 1) In its presentation of computing, it talks about analog computing and digital computing. That was common until not long after personal computers arrived; after that, articles and books about “computers for dummies” stopped talking about analog computing and concentrated on digital (big mistake, IMO, but that’s a different story). 2) It talked about chess and made the point that, from an abstract point of view, chess was just like tic-tac-toe, an utterly trivial game. Both games are finite and the games can be arranged in a tree structure. But the chess tree is so large that not even the largest computer can list them all. I thought about it a little, got the point, and that has stuck with me ever since.
That was back in the mid-1960s. The largest computers at that time are tiny in comparison to the Brobdingnagian behemoths being used to train contemporary AIs, but they’re still too small for the chess tree.
The thing is, and here we’re getting to the Tylerian point, chess has been played by thoughtful people for centuries. Why was it only in the early 20th century that its tree-like structure was recognized? That’s so simple, and so basic.
So I put the question to Claude, who answered. And then we went on from there. I’ll present that whole dialog shortly, but I want to discuss something that I discovered while thinking about its answer.
Game theory (reading the historical record backwards)
First, I already knew more or less how it would answer, I was asking the question to get details. Ernst Zermelo explicated the tree structure in a paper published in 1913. The Wikipedia entry, “Zermelo’s theorem (game theory),” opens like this:
In game theory, Zermelo's theorem is a theorem about finite two-person games of perfect information in which the players move alternately and in which chance does not affect the decision making process. It says that if the game cannot end in a draw, then one of the two players must have a winning strategy (i.e. can force a win).
That clearly says that Zermelo’s theorem belongs to that branch of investigation known as game theory. But, thought I to myself, wasn’t game theory invented by John von Neumann in the middle of the 20th century? So I did an Ngram search on “game theory”:
The chart doesn’t lie. “Game theory” shows up in the middle of the century, not the beginning. So a did a bit more digging and found an article, “Zermelo and the Early History of Game Theory,” that begins with this sentence: “It is generally agreed that the first formal theorem in the theory of games was proved by E. Zermelo in an article on Chess appearing in German in 1913 (Zermelo (1913)).” So, Zermelo’s theorem was retrospectively assimilated into game theory once game theory had become a recognized field of inquiry. Culture works like that.
The same thing has happened to the diamonds-water paradox that Tyler presents early in the first chapter as THE paradigmatic example of marginalist reasoning (page 4 in the PDF). Tyler presents part of that story in the book. I dug out more of the story by querying the accompanying AI. I won’t go into the details, but it boils down to this:
- Galileo presents it in the 17th century, but in a context where it’s part of a debate between the Ptolemaic and Copernican cosmologies.
- Adam Smith picks it up in The Wealth of Nations (1776), where it’s framed as “a point about the inadequacy of use-value as an explanation of price” (AI’s wording).
- Jevons frames it as an example of marginalism (1871).
- Paul Samuleson uses it in his 1948 textbook, Economics: An Introductory Analysis, which enshrines it in economic doctrine as the prime example of marginalist thinking.
And that textbook went on to become one of the most important introductory texts in economics.
And so it goes.
Note: At times Claude refers to my "Play book." That's the book I've been working on, and with ChatGPT as well: Play: How to Stay Human in the AI Revolution.
* * * * *
I now give you my dialog with Claude. Zermelo’s theorem is the start. From there:
- Chess as a HUMAN game;
- Jazz improvisation, constraint, freedom, and dialog:;
- AlphaZero’s stylistic distinctiveness;
- Machine intelligence and human intelligence;
- Language as a domain for computation, and finally;
- Chess as a specialized form of language.
I may return to those last three topics in a later post.
* * * * *
What are the prerequisites for Zermelo’s theorem?
I’ve got a question about chess, one that’s related to Chapter 3 of Tyler Cowen’s book, though he doesn’t discuss chess anywhere. But he is concerned about why economics seems to come so late in intellectual history. So, why, given how long chess has been played, was it only in the early 20th century that the tree-structure of the game was recognized? On the one hand, since the chess tree is so large, it would be impossible to list the whole thing and determine its finite nature that way. But, abstractly considered, on might imagine getting a hint in that direction by trying to list the games. White has 20 possible moves to start (even if most of them are never played), and Black has 20 possible replies to each of those. At this point the chess tree is 400 branches wide, and it rapidly gets much worse. On second thought, no, that won’t work.
You need a method of reasoning. And that didn’t exist until….Well Zermelo’s proof dates from 1913. Just when the relevant math existed, I don’t know, but I doubt it would have been much before then. I’m sensing intuitively that we’re in the same ballpark as Cantor’s diagonal method, which was late 19th century.
Your intuition about the timing is sharp, and the Cantor connection is real — but the deepest reason runs even further into the foundational transformation of mathematics than just the diagonal argument. Let me develop this.
What the proof actually requires
Zermelo's 1913 paper is titled, revealingly, "On an Application of Set Theory (Mengenlehre) to the Theory of Chess Games." The title tells you everything: this is explicitly an application of Cantor's set theory. Zermelo needed three things that didn't exist before the late 19th century:
First, the concept of a complete strategy as a mathematical object — not a heuristic or a rule of thumb but a function that specifies a move for every possible position that could arise. This is a first-class set-theoretic object: a mapping from the (enormous but finite) set of all positions to the set of available moves. You cannot even state Zermelo's theorem without this concept. Before Cantor, "function" meant something like a formula or rule that you computed — the idea of a function as an arbitrary set-theoretic object, definable over an enormous domain without being explicitly written down, was simply not available.
Second, backward induction as a proof method on a finite tree of unbounded but determinate depth. You need to show that the set of positions from which White can force a win is well-defined, even though it's a subset of an incomprehensibly large structure. The argument works by showing that this set is recursively characterizable from the terminal positions backward — but "recursively characterizable over a structure you cannot enumerate" requires treating the whole tree as a completed mathematical object, not an unfolding process.
Third, Zermelo's own axiomatization of set theory (1908) — which came five years before the chess paper. The Axiom of Choice, in particular, is needed in the general case (for games of possibly unbounded length before the 50-move rule imposes finitude), and Zermelo had just invented it.
The Cantor connection is precise
You're right that it's the same ballpark, but it's more than approximate temporal proximity. The specific intellectual move Cantor pioneered that made Zermelo possible was the treatment of infinite (or astronomically large) structures as completed wholes — actual infinities, not potential infinities. Before Cantor, the dominant view (even among mathematicians, following Aristotle) was that infinity was always a process, never an object. You could always add one more, but you couldn't speak of "the set of all natural numbers" as a single thing you could reason about.
The chess tree is finite, but it's so large (roughly 10¹²⁰ possible games) that the conceptual move required is structurally identical to Cantor's: you cannot approach it by enumeration or by any incremental process. You have to treat the entire tree as a completed object and reason about its global properties by abstract methods. Cantor's diagonal argument is the purest example of this — you prove something about an uncountable structure by an argument that never touches most of its elements. Zermelo's chess proof does the same thing: it proves that a particular position is a forced win for White (or a draw) without ever specifying which moves constitute the winning strategy.
The Rank 4 structure
The cognitive rank analysis maps cleanly onto this. Rank 3 chess thinking is exactly what the great 19th-century practitioners developed: Morphy's tactical superiority, Steinitz's positional principles, opening theory, endgame theory. These are all algorithmic in the relevant sense — they are rules, patterns, procedures for navigating the game. This is sophisticated, even brilliant, but it's thinking with chess moves and positions as primitives.
The Rank 4 move is to ask: what kind of object is a chess game? Answer: a finite tree with two alternating agents, terminal nodes with assigned outcomes, and the question of whether a particular class of strategies (those guaranteeing a given outcome regardless of opponent play) is non-empty. You've taken the activity of playing chess — the Rank 3 algorithms and heuristics — and made the activity itself an object of mathematical modeling. That's precisely your formulation: the operations of the lower level become the objects of the next.
And notice what this means for the specific threshold you identified. The reason even imagining that the game is finite and tree-structured doesn't help you until you have the right tools is that the question you need to ask — "is there a complete strategy that guarantees outcome X?" — cannot be formulated without the concept of a complete strategy as a set-theoretic object. Knowing the tree is there doesn't help if you have no language for the thing you're trying to prove exists.
Cowen's theme
This fits Cowen's chapter 3 argument exactly. His diagnosis for why economics was so slow is that economic ideas require peering around a corner — you can't reach them by brute force, and you don't know where to look until the relevant conceptual apparatus exists. The chess case is if anything cleaner, because here the brute force failure is quantifiably obvious: 10¹²⁰ is not a number you can work through. The delay isn't from lack of effort or lack of institutional support or Christian resistance — the game had been played intensely for a thousand years by people who were clearly thinking hard about it. The delay was purely conceptual: the question couldn't even be asked in a mathematically tractable form until Cantor, Zermelo, and the late-19th-century foundations revolution made the requisite objects available. That's as clean an example as one could want of Cowen's "factoring a very large number" metaphor — once you have the factors (set theory, the function concept, backward induction), the result is almost immediate.
The 1913 date isn't merely contemporaneous with the Cantor-era mathematics — Zermelo himself was one of the creators of that mathematics, applying his own toolkit almost immediately to a new domain.