NEW SAVANNA: chess_lang

Showing posts with label chess_lang_core. Show all posts

Tuesday, April 16, 2024

AI, Chess, and Language 3.3: Chess and language as models, a philosophical interlude

I now want to take a look at AI, chess, and language from a Piagetian point of view. While he is best known for his work in developmental psychology, Piaget was also interested in the development of concepts over historical time, which he called genetic epistemology, and more generally, in the construction of mental mechanisms. He was particularly interested in abstraction and in something he called reflective abstraction. The concept is a slippery one. Ernst von Glasersfeld has a useful account (Abstraction, Re-Presentation, and Reflection: An Interpretation of Experience and of Piaget’s Approach) from which I abstract, if you will, perhaps an over-simplified idea, which is that major steps in cognitive evolution involve taking a mechanism that is operative at one level and making it an object which is operated on by mechanisms at a new and higher level.

Let us take chess as an example. We know that all possible chess games can be arranged as a tree – something we examined earlier, Search! What enables us to entertain the idea that chess is a paradigmatic case of cultural evolution? But we have only known that since the mathematician Ernest Zermlo published “Über eine Anwendung der Mengenlehre auf die Theorie des Schachspiels” (“On an application of set theory to the theory of chess”) in 1913. Ever since the game emerged players have been exploring that tree, but without explicitly knowing it. It was only when Zermelo had published the idea that the chess tree became an object that could be explicitly examined had explored as such.

I don’t know when that idea crossed into the chess world. In a quick search I found out that Alexander Kotov used it a book, Think Like a Grand Master, which was translated into English in 1971. Kotive wrote of building an “analysis tree.” I assume that chess players became aware of the idea sooner than that, perhaps not long after Zermelo’s paper was published. In any event, for my present purposes, the date is irrelevant. What is important is simply that it happened. The tree structure has been central to all work in computer chess.

The tree structure is central to the activity of search. But there is more to chess than searching for possible moves. The moves must be evaluated and a strategy has to be executed. Various means have been developed to do those things with the result that computers can now play chess better than any human. Chess is a “solved” problem. And the components of various solutions are objects for explicit examination and design.

Almost.

Unlike earlier chess problems, which have been completed based on symbolic technology, some of the most recent chess programs, such as AlphaZero, use neural nets for the evaluation function. What those nets are doing is opaque. We know how to build them, but we don’t know what they do.

And that brings us to language.

Language has been investigated for centuries. More specifically, it has been subject to formal analysis for the last three quarters of a century, but cognitive scientists have come to little agreement about syntax much less semantics. Nonetheless large language models are now capable of very impressive language performance. Like all neural network models, however, these models are opaque. But what if we could figure out how they worked internally?

Consider following diagram, which I have commented about in my paper on GPT-3: GPT-3: Waterloo or Rubicon? Here be Dragons:

Texts are one product of the interaction of the human mind and the world. LLMs are trained on large bodies of these texts. It follows that the internal structure of these models must somehow, we don’t know how, reflect the nature of that interaction. If we could understand the internal structure of these models, wouldn’t that be a reflective abstraction over the processes of the human mind in the same way that the chess tree is a reflective abstraction over the human mind as it is engaged in the game of chess?

Yes, the chess tree is not all of chess, but only a part. And we know how to augment that part with evaluation functions. Figuring out how LLMs work would not be equivalent to knowing how the mind works, but might be a start. To know and be able to manipulate the conceptual structure that is latent in LLMs, that would be a major intellectual accomplishment.

Wednesday, April 10, 2024

AI, Chess, and Language 3.2: Search! The case of language

In the previous post in this series I considered the issue of search in the context of chess: AI, Chess, and Language 3.1: Search! What enables us to entertain the idea that chess is a paradigmatic case of cultural evolution? Now I want to look at search in the case of language and texts, which is quite different, as language presents computational problems which are quite different from those of chess.

In particular, as I pointed out in the first post in this series, chess and language have very different geometric footprints in the world. The geometric footprint of chess is quite simple and sharply specified, being limited to the board and the pieces. The geometric footprint of natural language is quite different. Since it must deal with the world in an open-ended way, its geometric footprint must encompass the world. The upshot of this difference is that, which it is possible to list all possible chess games and organize them into a tree (at least in principle, if not in actuality), there doesn’t seem to be any way to list all possible intelligible texts and no principled way to organize them. Note that I’m not talking about listing all possible combinations of words, as most of those combinations would be nonsense. I’m talking about intelligible texts, which is quite a different matter and is furthermore subject to change as language changes.

Nonetheless I think it is instructive to consider two very different cases, parsing syntax and document retrieval.

Parsing Syntax

For several decades now the study of language has focused on syntax: What are the rules that govern sentences. How can we characterize an infinite set of sentences using a finite set of syntactic rules. One aspect of the rules is a finite list of word types: nouns, verbs, adjects, and the like. Then we have a finite set of rules that govern how word types can be organized into sentences.

Why is it important to have a finite set of word types? While the vocabulary of a language is finite at any given moment in time, that vocabulary is subject to change, and may well grow over time. If however we can classify every word as being an example of some particular word type, and allowing that some words may function in two or more types, then the actual composition of the vocabulary at any one time doesn’t matter. We can define syntax in terms of word types alone.

In computational linguistics parsing is the process of searching through the universe of possible sentence descriptions for one the matches a given sentence. A wide variety of strategies have been developed for parsing, with varying degrees of success. They tend to be subject to problem of combinatorial explosion, as sentences become longer and more complex, the number of alternatives that must be tried increases, and tends to do so exponentially. Nonetheless parsers have been developed that are useful and usable in various contexts.

However, all we get from this are accounts of why this or that sentence is permissible. The following sentence is grammatically correct, but meaningless:

Colorless green ideas sleep furiously.

The is, of course, Chomsky’s famous example which he introduced in Syntactic Structures to make the point that syntax is independent of semantics.

The poet John Hollander made that sentence tractable by adding two lines before it so that the three lines together constitute a (meaningful) poem:

Curiously deep, the slumber of crimson thoughts:
While breathless, in stodgy viridian
Colorless green ideas sleep furiously.

That is a matter of semantics, and semantics seems to be ill-defined and unbounded.

Document Retrieval

Now let’s think about going to a library to find a book. Each book is assigned a number and the books are placed on shelves according to number. To find a book you need to know its number.

How do you find that? In the old days libraries had card catalogues, which still exist. One catalog would be organized according to author’s names; another was organized by book titles; and a third was organized according to subject category. If you knew either the title or the author’s name, you would look them up in the appropriate catalogue to find the book number. If you simply wanted to search by topic, you would go to the subject catalog and start browsing through it.

Electronic library catalogues have been available for decades. I can go online and do an electronic search through the catalogue of my local library. Author, title, and subject searches are available, and I can also search by keyword.

Starting back in the 1970s, however, Gerard Salton and his colleagues developed more sophisticated methods were developed for searching collections of electronic documents. Such documents typically had an abstract associated with them; the abstract gave a short description of the document. By representing the abstract as a vector in a high-dimensional space of words, it became possible to search for documents simply by presenting a query in natural language. The query would be transformed into a vector and the vector would then be matched against the vectors for the document collection. A list of the documents with the closest matches would be returned to the user.

In this case the documents in a collection were being directly queried by their contents. That’s quite different for searching a card catalog the identifying numbers of books which could then be used to locate books on shelves. The book location system is fundamentally external to the contents of the books. Such a separation between location and content doesn’t exist in the kind of system Salton developed. This is the origin of the vector-based semantics that underlies current work machine learning.

Interesting, but it’s not like chess

There’s a lot more I could have said about language and computing, but this is enough to make my basic point, which is that language and chess are very different computationally. And that difference can be traced back in large part to their geometric footprints in the world. That takes me back to the end of the first post in this series, where I mentioned the work of Miriam Yevick. In her 1975 paper Yevick was specifically interested in the relationship between the nature of the objects that were the object of computation and the procedures used to accomplish the computation:

Miriam Yevick, Holographic or Fourier Logic, Pattern Recognition 7, 1975, 187-213, https://doi.org/10.1016/0031-3203(75)90005-9.

She considered geometrically complex objects, such as Chinese characters or any number of natural objects, on the one hand and geometrically simple objects, such as the objects studied in plane geometry. The former required holographic logic, in her terminology, while the latter could use ordinary propositional calculus.

Can her argument be extended to the difference between chess and language, where the former is considered to be geometrically simple and the latter geometrically complex?

Tuesday, March 12, 2024

AI, Chess, and Language 3.1: Search! What enables us to entertain the idea that chess is a paradigmatic case of cultural evolution?

John McCarthy – who, you may recall, coined the phrase “artificial intelligence” – has written a well-known article with the title, “Chess as the Drosophila of AI” (1990). In his introduction McCarthy notes:

One of the pressures I was under came from people in computer science. They sometimes urged me to tackle topics of practical importance and to concentrate on experimental and theoretical work in precisely these applicable areas, as opposed to a backwater such as computer chess. This echoes a remark that might have been made to Thomas Hunt Morgan in 1910: “Elephants are far more useful than fruitflies and who wants better fruitflies? So why don't you do your work in genetics on elephants rather than on fruitflies?” To which Morgan could have countered: “It takes no more than two weeks to breed a generation of fruitflies, you can keep thousands of them in a bottle and they are cheap to feed.”

It is in that spirit that AI researchers have invested so much time and effort in computer chess – though I fear they’re out over their skies when they claim, as some have done, that chess is the apex of human intelligence and thus, when we’ve beaten chess, we’ll have human intelligence licked – and it is in that spirit that I am taking it up in this series of notes:

I don’t understand chess well enough to follow McCarthy’s article, which is based on four examples, but I note that the first example involves search. That’s what I want to think about.

Chess and search

Searching is one of the fundamental tasks of practical computer programming and the study of search is fundamental to computer science. Chess is readily understood as a process of search through a large, but well-defined, space for good moves. In fact, when considered abstractly the chess space is like the tic-tac-toe space, but much larger.

Each game is played on a board of finite size; each game is played with a finite number of pieces; and the number of moves available at each point in play is finite. Tic-tac-toe ends either when one player has three counters in a row or all the board spaces are filled. Chess ends either when one player has been check-mated or when no pieces have been exchanged after a pre-specified number of turns. In both cases the game space takes the form of a tree, where the root is the game state before any move has been made and the leaves are game states where either one player has won or there is a draw.

In both cases the tree is finite. In the case of tic-tac-toe the game tree is relatively small and writing a program to play tic-tac-toe is easy enough that it can be included as an example in a first course in computer programming – I wrote such a program years ago when I was an undergraduate at Johns Hopkins. In the case of chess the tree is huge, much too large to construct even with the largest computer.

While it is thus impossible to draw a diagram of the complete tree, I’m a visual thinker and I like diagrams. In this case I think a diagram is helpful in making that point that we ARE talking about a space, even if it is impossibly large. So, consider this diagram:

Chess Tree

The first player, White by convention, has 20 possible moves, one of two possible moves for each one of eight pawns and one of two possible moves for one of two knights. Thus there are 20 branches off of the root of the tree. Black has twenty possible moves in reply, for a total of 400 branches. Depending on White’s first move, it will have 20 or more possibilities for its second move, and so forth. The chess tree branches out very quickly.

A little reflection should make it clear that the chess tree, though very very large, very large indeed, it is (merely) finite. It is so very large that its finitude, unlike that of tic-tac-toe, has no practical value. The size of the chess tree is an example of combinatorial explosion, a concept we will consider again. for now, let us move on.

Consequently, chess has obsessed intelligent players for centuries and computer scientists for decades. Finding winning paths turns out to be a very difficult problem. Just how researchers have approached that problem is no doubt interesting, but not to me in the context of this essay. What interests me is the simple fact that we can conceptualize chess as a game involving the exploration of a well-defined mathematical object, a tree, whose properties are well understood, is difficult.

Chess in history and in life

Now, chess wasn’t invented in a day. It evolved over three millennia from roughly 1500 BC to 1500 AD, when it reached its present form. We can think of the game’s evolution since then as one of exploring the chess tree and accumulating knowledge about its properties. To become an expert chess player, not only must you play many games, you must also study the history of the game so that you can acquire some of that accumulated knowledge.

If you don’t mind, I want to go through that again, but a bit differently. It’s important. The chess tree is a real object, as real as that tree outside your window, but abstractly real, not concrete. Just as a child can explore a tree by climbing it, a chess player explores the chess tree by playing the game (and studying games played by others). People have been exploring the physical world ever since we parted ways with our primate ancestors; but it’s only in the last couple of thousand years that we’ve been able to think of the earth as a sphere. Similarly, it wasn’t until the mid-20th century that people began thinking of chess-world as being a finite, but very large, tree. You don’t need to have that conception explicitly in mind in order to explore multiple paths of game-state evolution from the current game state. The fundamental point is that the chess tree is a real object.

Does the ontogeny of AlphaZero chess-play recapitulate the phylogeny of human chess?

It's an idea in terms of which one can understand concepts, chess concepts. To explore the chess tree is to explore the world of chess concepts. A bit later I want to generalize this approach to think of ideas, any kind of ideas, as existing in some abstract space. Thus when working with those ideas, we’re exploring some space. But let’s get back to chess.

Traditionally, one builds a chess engine by painstakingly encoding chess concepts into a program by using traditional programming concepts. Roughly speaking we have two kinds of concepts:

Search: how do you search through the chess tree? and
Evaluation: how do you determine the value of a given position?

Those routines are programmed in consultation with chess experts. The program that beat Gary Kasparov, IBM’s Deep Blue, was programmed in that way. Until recently, Stockfish, one of the top-rated chess engines, was programmed in that way. You program the engine, put it into matches, and see how it plays. From time to time you update the engine.

Things changed in 2017 when Deep Mind released a paper, AlphaZero, an artificial neural network trained to play Go, Shogi, and Chess:

AlphaZero was trained solely via self-play using 5,000 first-generation TPUs [tensor processing units] to generate the games and 64 second-generation TPUs to train the neural networks, all in parallel, with no access to opening books or endgame tables. After four hours of training, DeepMind estimated AlphaZero was playing chess at a higher Elo rating than Stockfish 8; after nine hours of training, the algorithm defeated Stockfish 8 in a time-controlled 100-game tournament (28 wins, 0 losses, and 72 draws). The trained algorithm played on a single machine with four TPUs.

AlphaZero had learned to play chess simply by playing chess, game after game after lots of games. No humans supplied it with hand-coded routines for evaluating play.

Later, in 2021, researchers at DeepMind and Google Brain, along with Vladimir Kramnik (World Chess Champion, 2000-2007) published a paper, Acquisition of Chess Knowledge in AlphaZero. One of the things they investigated: “When looking at the evolution of chess understanding within a system like AlphaZero, one has to wonder whether there is such a thing as a natural progression of knowledge, a way of developing an understanding of the game that is specific to the game itself, rather than being purely arbitrary and left to chance – or is it a mix of both?” The latter turns out to have been the case.

Answer: Not quite.

For my present purposes, the details are irrelevant. What’s important is simply that the comparison could be made at all. That possibility depends on conceptualizing chess as a game that’s played in a space that takes the form of a tree. In the human history of the game, humans have explored that tree and left records of that exploration. Now we have a computer system that explores that tree without benefit of priming by humans. How one plays the game is constrained by the rules of chess, but how one explores the possibilities inherent in those rules, that is not dictated by those rules. Rather, it is a product of “discoveries” made in the process of games play. That there are similarities between the history of chess and AlphaZero’s learning history tells us that that process is not an arbitrary one, is not completely captured by local contingency after contingency. That there are differences, however, that there is some free-play in how things unfold.

If you know a bit of intellectual history, you may have recognized that I’ve just run a variation on an idea associated with Ernst Haeckel in the late 19th century who gave it a succinct formulation, “ontogeny recapitulates phylogeny,” meaning “that an individual organism's biological development, or ontogeny, parallels and summarises its species' evolutionary development, or phylogeny.” In the present context, one might paraphrase Haeckel’s formulation as: the ontogeny of AlphaZero chess-play recapitulates the phylogeny of human chess. It turns out that Haeckel’s formulation is not quite true in the biological case. Nor, it would seem, is my revision true of chess. Still, the parallel is not without interest.

[Note: Fans of Johns Barth may recall that he ran his own variations on Haeckel in Giles Goat-Boy: “ontogeny recapitulates cosmogeny” and "proctoscopy repeats hagiography.”]

Generalizing wildly, what, if anything, does that tell us about human cultural evolution over historical time? If the evolution of something as austere and tightly structured as chess has room for both determinism and free-play, what about something as luxuriant and open-ended as human life on earth?

But I’m getting ahead of myself. Let’s turn our attention to language.

Bonus: An exercise for the reader

In my penultimate paragraph I suggested an analogy between human cultural evolution and the historical evolution games-play in chess. Chess is a game played by human beings. In what way is cultural evolution also a game played by human beings, Homo ludens, it you will?

Friday, March 8, 2024

AI, Chess, and Language 2: Further remarks [SPSH]

Yesterday I reflected on the computational approximation of chess play and the computational approximation of linguistic activity: AI, Chess, and Language 1: Two VERY Different Beasts. Today I want to reflect on the actual physical situation, considering it in relation to Saty Chary’s Structured Physical System Hypothesis (SPSH), which stands in contrast to Newell and Simon’s 1976 Physical System System Hypothesis (PSSH). The latter states: “A physical symbol system has the necessary and sufficient means for general intelligent action.” In contrast the SPSH posits an underlying analog substrate rather than the digital one posited by Newell and Simon. The analog substrate we’re talking about is, of course, the human brain embodied in a human body.

The PSSH implies:

that the brain is physical symbol system, all the way down, and
that, because computers are such systems as well, they can adequately simulate/emulate the perceptual and cognitive activities of the human brain.

The SPSH implies:

the brain is such a system (denying that it is a physical symbol system), and
digital computers can only approximate the brain’s perceptual and cognitive activities.

Given that human brains can deal with language, they must in some sense be physical symbols systems. But they are not symbol systems all the way down. At the basic level, the brain is just a structured physical system, most likely one exhibiting complex dynamics.[1] My previous chess and language post was about the computational approximation of those two activities by computational systems, pointing out that the requirements of those systems must be quite different. In this post I am interested in what’s really going on, physically. This will be mostly tautological in character. I just want to make things explicit.

In the case of chess, the board and associated pieces are the physical limit of the chess world. There is nothing more beyond that. Of course, a board and pieces can be physically realized in many different ways, but each realization is a complete and sufficient basis for playing chess. The relationship between the chess world and the geometric footprint required for a computational simulation it, that relationship is thus simple and transparent, so much so that a very simple symbolic notation provides an adequate basis for any computer chess engine.

It is quite otherwise in the case of the real world and the operations of the brain in that world. We start with the real world as given to the senses. That is the basis for the primary geometric footprint of any computer system. In particular, that is what defines the possibilities for adhesion in a perceptual-cognitive system.[2] The relationship between the geometric footprint of a computer system and the world is not very well-defined; it is fuzzy and complex.

Through the abstractive capacities of the cognitive system, features of the physical world can be and are redefined and new entities can be introduced into cognition.[3] As examples of the first, consider salt and sodium chloride. The first is an entity given to the sense while the second is based on the 19th century conceptual system. Similarly, where the senses see two entities, the Morning Star and the Evening Star, astronomers see only one entity, the planet Venus.[4] As an example of the latter, think of charity as when someone does something nice for someone without thought of reward. That is the mechanism of metalingual definition as discussed in [3] and in this post, Does ChatGPT know what a tragedy is?

Contemporary large language models (LLMs), such as the one at the core of ChatGPT, do not have direct access to the physical world. They must approximate human cognitive capacities through the relationality implicit it existing written texts. It is a matter of some dispute whether or not this relationality, if sampled sufficiently, is an adequate basis for a computer system to achieve AGI, artificial general intelligence. I do not think it is an adequate basis.

Beyond this, we do not know what kind of computational system will be required for a “complete and adequate” simulation of human cognitive capacities. The relationship between the structured physical system that is the brain and the physical world is vast, complex, and ill-defined. It should be obvious from this brief discussion, however, that a computer system that is adequate for chess, will not, on the face of it, be adequate for all of human cognition.

* * * * *

[1] I have a working paper where I sketch out a scheme whereby the brain, as a complex dynamical system, can implement language, a symbolic system: Relational Nets Over Attractors, A Primer: Part 1, Design for a Mind, Working Paper, June 20, 2022, pp. 73, https://www.academia.edu/81911617/Relational_Nets_Over_Attractors_A_Primer_Part_1_Design_for_a_Mind

[2] Here I am referring to the three-part scheme for meaning that I have outlined in various places:

meaning consists of intention plus semanticity, where intention inheres in the relationship between two speakers, and
semanticity consists of adhesion plus relationality, where adhesion connects perception and cognition to the external world and relationality is about the relationships among elements in a perceptual-cognitive system. See e.g. this post: Semanticity: adhesion and relationality.

[3] See, e.g. William Benzon and David Hays, The Evolution of Cognition, Journal of Social and Biological Structures. 13(4): 297-320, 1990, https://www.academia.edu/243486/The_Evolution_of_Cognition

[4] These examples are discussed in William Benzon, Ontology of Common Sense, Hans Burkhardt and Barry Smith, eds. Handbook of Metaphysics and Ontology, Muenchen: Philosophia Verlag GmbH, 1991, pp. 159-161, https://www.academia.edu/28723042/Ontology_of_Common_Sense

* * * * *

Bonus: Consider this clip from Yann LeCun's recent conversation with Lex Friedman:

This is an essential point people seem to misrepresent. pic.twitter.com/l6v5dasstz
— parm (@prmshra) March 8, 2024

Lecun's point is simple: The amount of visual (and only visual) information that a four-year old has taken-in is far in excess of the amount of information our largest LLMs have been trained on.

Thursday, March 7, 2024

AI, Chess, and Language 1: Two VERY Different Beasts

Chess has been with AI since the beginning. In fact, computer chess all but beats the origins of the term “artificial intelligence,” which was coined for the well-known 1956 Dartmouth summer research program. According to this timeline in Wikipedia the possibility of a mechanical chess device dates back to the 18th century with a chess-playing automaton which, however, was operated by a human concealed inside it. A number of other mechanical hacks appeared before Norbert Wiener, Claude Shannon, and Alan Turing theorized about computational chess engines. John McCarthy invented the alpha-beta search algorithm in 1956, the same year as the AI conference, and the first programs to play a full game emerged a year later, in 1957. That was also the year that Chomsky published Syntactic Structures and the Russians launched Sputnik (which I was able to observe from my backyard).

Meanwhile, in 1949 Roberto Busa got IBM to sponsor a project to create a computer-generated concordance to the works of Thomas Acquinas, the Index Thomisticus. Thus the so-called digital humanities were born. That same year Warren Weaver, who was head of the Rockefeller Foundation at the time, wrote a memorandum in which he proposed a statistical rationale for machine translation. In 1952 Yehoshua Bar-Hillel, an Israeli logician, organized the first conference in machine translation at MIT’s Research Laboratory for Electronics and two years later IBM demonstrated the automatic translation of bits of Russian text into English (Nilsson 2010, p. 148).

The chess gang theorized that, as chess exemplified the highest form of human intelligence, when AI had succeeded in beating the best humans at chess, full artificial intelligence would have been achieved. In 1997 IBM’s Deep Blue beat Garry Kasparov decisively. Ever since then computers have been the best chess players in the world. But computer performance on language tasks has lagged far behind chess performance. The recent development of transformer-based large language models (LLMs) has resulted in a quantum leap in linguistic performance for computers, but the writing, though fluent, is also pedestrian (with various exceptions we need not go into). It would seem that there is a profound difference between the computational requirements of chess and those of language.The difference is not simply a matter of raw compute, with language requiring more, much more. There is also a fundamental, and perhaps even irreducible, difference in the way that compute is orchestrated computationally.

This post offers some quick observations about that difference. I discuss chess first, then language. While I’m a fluent speaker and writer of English, and know a bit about computational linguistics as well, I don’t play chess (though I do know the rules) and I know little about computer chess. I had a brief session with ChatGPT about chess. I’ve included that as an appendix.

Chess as a computational problem

While chess is a fairly cerebral game, it is a game played in the physical world using a board and game pieces. Let us call that chess’s geometric footprint. When we get to language we’ll talk about language’s geometric footprint, which is quite different from chess’s. The rules of chess can be defined with respect to its geometric footprint:

There are two players, who alternate moves.
It is played on an 8 by 8 board.
Each player has 8 16 pieces distributed over 7 types.
The moves of each type are rigidly and unambiguously specified.
There are other rules regarding games play among those pieces on the board.

The only constraint the players are subject to is the constraint that they obey the rules of the game. Any move that is consistent with the rules is permitted.

Given that the number of squares on the board is finite, the number of pieces is finite, that each play involves finite movement on the board, and that a convention is adopted to terminate play in the case no pieces are being exchanged, the total number of possible chess games is finite and takes the form of a tree.

To play chess at even a quite modest level one must have a repertoire of tactics and strategies that are not specified by the rules. Chess is a game where there is a game where there is a strict distinction between the basic rules and what we might call the elaboration, that is, the tactics and strategies governing games play. As a practical matter, large ill-defined areas of the chess tree are unexplored because, once a player moves into any of those areas, they will lose to a superior opponent. In particular, the opening of a game is quite restricted, not by the rules, but by well-known tactical considerations.

Given that the game is defined in terms of its geometric footprint, and that all possible chess games can be enumerated in the form of a tree, it follows that a person’s ability to play chess depends on how well they know the chess tree. However it is that they represent this tree in their minds is, at this point, a secondary issue. Given two players, if one of them knows a larger and more interesting (however one specifies interesting, a difficult problem) region of the chess board than the other, they will consistently win over the other.

It follows therefore, that since computers now consistently beat even the best of human players, they have explored regions of the chess tree that no human player has.

Language as a computational problem

Now let us consider human language. It has a geometric footprint as well, which is given in language’s relationship to the natural world. The meaning of a good many words is given directly by physical phenomena; much of so-called common-sense knowledge is like this. Unlike the geometric footprint of chess, which is small, simple, and well-defined, the geometric footprint of language is large, complex, and poorly defined. I would note further that words that are not given their primary meaning in physical terms as given in the human sensorium can be given meaning by various means, including patterns of words and patterns which include formal symbols as well, symbols from mathematics, chemistry, physics, and so forth. On this last point, see various posts tagged metalingual definition, and two papers that I wrote with David Hays:

William Benzon and David Hays, Metaphor, Recognition, and Neural Process, The American Journal of Semiotics, Vol. 5, No. 1 (1987), 59-80, https://www.academia.edu/238608/Metaphor_Recognition_and_Neural_Process

William Benzon and David Hays, The Evolution of Cognition, Journal of Social and Biological Structures. 13(4): 297-320, 1990, https://www.academia.edu/243486/The_Evolution_of_Cognition

It is thus difficult to make a firm distinction between the basic rules of language and the elaboration. While the number of possible chess games is finite, though it is so large that we cannot list it. It makes little sense to talk of listing all possible language texts. The number is unbounded and the set is not enumerable.

The basic rules of chess are so simple that computers need not play chess by moving physical pieces around on a board; a purely symbolic notation is entirely adequate. The training of LLMs does not involve access to the physical world. But the geometric footprint of language so constrains semantic relationships that LLMs can induce a suitable approximation of those relationships given a sufficiently large training corpus. But we have no way of determining whether or not an LLM can generate any possible text. Nor do we have any reason to believe that LLMs can generate any text that can be generated by human having full access to the physical world. In fact, given the problems that LLMs have with common-sense knowledge, we have reason to believe that LLMs are more limited in their capacities than the collectivity of humans.

Computationally, chess and natural language are very different beasts. The former is tractable and considered solved while the latter is still somewhat mysterious. Chess was solved by the methods of classical symbolic computing, methods which we understand. We know how classical chess engines work. But the newer neural network chess engines are black boxes, as are LLMs. We don’t (yet) know how they work. Moreover, while it is clear the LLMs are more generally successful with language than the old symbolic systems were, it does seem that the older symbolic systems are better at some kinds of reasoning and planning tasks and it is not obvious that LLMs can acquire those capacities.

Two kinds of computing and an exercise for the reader

I have written various posts about the work of Miriam Yevick. See, for example: Toward a Theory of Intelligence: Did Miriam Yevick know something in 1975 that Bengio, LeCun, and Hinton did not know in 2018? In particular, Yevick proved that in the task of visual identification, symbolic systems (like old-style chess engines) are better than neural nets. Does her result generalize across the whole range of tasks we would require of our intelligence engines? If so, then it means that we cannot reach AGI (whatever that is) simply by making larger and larger LLMs.

Appendix: ChatGPT on computer chess

Is there any evidence that good computer chess programs sometimes play differently than the best humans?

Yes, there is evidence that computer chess programs, especially the strongest ones, can play differently than the best human players. Computer chess engines rely on brute-force calculations and sophisticated algorithms to evaluate positions and make decisions, while human players often rely on intuition, pattern recognition, and strategic understanding.