NEW SAVANNA: January 2026

Saturday, January 31, 2026

Breakfast, omelette based

Game theory, MAGA, and abortion: The Trumperor’s varied partisans

Let us assume that the story of the emperor who has no clothes, that that story is about the conversion of shared knowledge into common knowledge. Everyone, the emperor himself, his courtiers, the population at large, and of course those rogue weavers, everyone can see that he is naked. That knowledge is shared among them. For the rogue tailors, however, that knowledge is common; each knows that the other knows that the other ad infinitum.

Once the boy blurts out, “He’s naked!!!,” though, what was only shared knowledge has now become common among the entire populace. Everyone knows that everyone else etc. Let us suppose, however, that some people would just as soon not know that the emperor is naked while others don’t mind knowing among themselves (e.g. those rogue tailors), but they’d just as soon that no one outside their group knew. What then?

Those rogue tailors might make a move to mute the boy’s messages, perhaps by disparaging him and his parents, perhaps even by imprisoning him. Those courtiers, who’d just as soon believe the emperor to be clothed, for their livelihood depended directly on his largesse, the courtiers would see the rogue tailors making their move and would join them in their efforts. Surely there were others in the population who also benefited from the emperor’s power and wealth. They too would join in the effort to silence the boy.

Now the population is split in two, the party of the emperor and his supporters, and the other party, at best indifferent at the revelation but many quite gleeful as the emperor and his cronies had done them no good at some time in the past. This other party sees the boy as a hero and attempts to protect him.

At this point the parable stops being a children’s story and becomes a model of political coalition formation under epistemic stress. For it’s not this imaginary emperor that interests me. It’s a very real Donald J. Trump.

I’m casting him in the role of the naked emperor. He is widely perceived as corrupt, even among many of his allies. Thus the Big Boys of Silicon Valley are attracted to him because they want to benefit from his political power. They see that he’s also cruel and sexually profligate (or at least he was in the not so distant past, who knows these days). They may not like these characteristics in him, but they can tolerate them as long as he supports them in their business ventures. However, they cannot themselves afford to be seen as corrupt, at least not all that corrupt, just business as usual. Fortunately he has plenty of support from other businessmen. And it would help if Trump had significant support from those who favor him on other grounds.

For that we have the MAGA faithful, which, as far as I can tell, is a motley crew. Many of them may not like immigrants, but as long as he promises to kick them out, they could care less about his licentiousness. As for his cruelty, as long as he directs it to those immigrants, it’s fine by them.

But we also have the Christian right, many of whom abhor that licentiousness, but they also abhor abortion. As long as abortion is confined to secretive doctors who hide their services and to doubtful practitioners who slink around back alleys in the dark, these people may not say much. But once public abortion clinics become available, that cannot be allowed. But if Trump will appoint Supreme Court justices who are with them on this, they’ll support the Trumperor. He did and they do.

Finally among those who abhor abortion we have those who insist that a woman who has been raped must carry the fetus to term. Why? To allow the abortion is to admit that the rape took place, that men can be, are all too often, cruel. Or perhaps they want to blame the woman herself for the rape, “She was askin’ for it.” Well, as long as DJT gets rid of abortion, he can do whatever he wishes with women, but just don’t tell us about it. These are not necessarily explicit beliefs held by all individuals, but they are functional outcomes of the system.

* * * * *

ChatGPT created the illustration above based on the following photograph, taken on a Day of the Dead celebration in Jersey City in November of 2015:

Moltbook – A community for AIs

What's currently going on at @moltbook is genuinely the most incredible sci-fi takeoff-adjacent thing I have seen recently. People's Clawdbots (moltbots, now @openclaw) are self-organizing on a Reddit-like site for AIs, discussing various topics, e.g. even how to speak privately. https://t.co/A9iYOHeByi
— Andrej Karpathy (@karpathy) January 30, 2026

Astral Codex Ten, Best of Moltbook, Jan 30, 2026.

Friday, January 30, 2026

Teaching AIs how to draw semantic network diagrams, and other things

In June of last year I decided to ask ChatGPT to draw a semantic network diagram for Shakespeare's Sonnet 129. Why did I choose that task? Because it is something that humans can do, but it is not rocket science; it doesn't require genius level capability. I wanted to put a bound on all the hype about LLMs already being AGIs (whatever they are), or close to it. I chose ChatGPT because it is capable of drawing. The task requires the ability to draw, which ChatGPT has.

I wrote up the experiment in this working paper: ChatGPT tries to create a semantic network model for Shakespeare's Sonnet 129 (June 16, 2025). Here's the abstract:

This document explores the capacity of large language models, specifically ChatGPT, to construct semantic network models of complex literary texts, using Shakespeare's Sonnet 129 as a case study. Drawing on the author's prior work in cognitive modeling, the analysis reveals that ChatGPT, while capable of producing linguistically coherent commentary, fails to generate a structurally plausible semantic network for the sonnet. The failure is traced not to a lack of exposure to relevant literature, but to the model's lack of embodied, interactive learning. The process of constructing cognitive network diagrams is shown to be iterative, visual-verbal, and skill-based-comparable to learning a physical craft like playing an instrument or woodworking. It requires extended practice under expert feedback, enabling a form of reasoning that is neither algorithmic nor easily reducible to textual description. The essay argues that this hybrid modeling skill represents a "deep" human capability that is nevertheless teachable and routine. It concludes with reflections on the nature of such skills and their implications for AI, pedagogy, and literary interpretation. Asking ChatGPT create a semantic model for a Shakespeare sonnet.

About a week ago I had a long dialog with ChatGPT, first about how humans learn this task and then, second, what it would require to teach AIs how to learn the task. From there we went on to the more difficult task of probing the structure latent in the weights of an LLM and thereby extracting/constructing of map of that structure. Here's ChatGPT's summary of that discussion.

Summary: Learning Semantic Nets, Mapping Latent Structure in LLMs, and the Hybrid Future of Intelligence

We began by returning to a practical question that had surfaced in earlier work: why it is difficult for an AI system to produce good semantic or cognitive network diagrams for literary texts (such as a Shakespeare sonnet), even though a capable human can learn to do it. The core issue is not that such diagramming requires genius-level insight. Rather, it is a form of skilled representational practice—learnable by apprenticeship, but challenging for current AI systems because it requires disciplined structure-building, normative constraint satisfaction, and iterative repair.

1) Apprenticeship and “inside knowledge” of diagramming

A central theme was the idea that semantic network diagrams cannot be properly understood from the outside. A person learns to read them only by learning to produce them. This applies even more strongly to complex cognitive networks, where a diagram can appear meaningless unless one has internalized the diagram grammar: node types, arc types, compositional constraints, and “house style” norms about what counts as an admissible analysis.

You offered an idealized description of your training with David Hays. The training regime resembled a repeated cycle of supervised practice:

you brought a fragment of text and a diagram you had produced,
Hays evaluated it (approve/disapprove) with commentary,
you revised or moved forward accordingly,
the cycle repeated,
and over time the normative discipline of diagramming became internalized.

You also noted that this same pattern governed group work among peers who had learned the system: a collaborative problem was brought to the table, and discussion plus sketching continued until a coherent solution emerged. The key was not merely producing diagrams, but learning the discipline that makes diagrams meaningful and correct.

From this, you proposed an account of what is being learned: a repertoire of correspondences between verbal fragments and diagram fragments. Under that view, diagramming competence is partly the acquisition of a “library of moves,” where particular linguistic patterns or conceptual pressures cue specific diagram operations. Equally important, however, is a critic’s sense of global coherence—a normative capacity to judge whether a graph “hangs together” as a model of the text and to identify what must be repaired.

You emphasized that at any time there is a locally stable diagram grammar, even if it cannot be complete in principle. In your own case, you began with Hays’ textbook Mechanisms of Language and learned to produce diagrams specified in particular chapters (cognition, perception). After three months of concentrated training you had internalized the system well enough not merely to use it, but to extend it: you proposed a new arc type, specified its assignment conditions, and demonstrated its usefulness. This was identified as an important marker of mastery: moving from conforming to norms to making responsible innovations within the normative system.

2) Why this is “easy” for humans but hard for AI

The conversation then turned to the striking asymmetry: semantic network diagramming is learnable by humans with patience and guidance, but remains difficult for AI systems. The difficulty is not lack of general linguistic ability; it is that diagramming requires explicit normative structure and repair behavior. Humans develop an internal sense of error: what is missing, what violates the grammar, what is incoherent globally. Current models often produce plausible fragments but struggle to maintain consistent typing, global integrity, and systematic revision under critique.

This diagnosis led to an important idea: it would be possible for AI to learn semantic network construction through an analogous apprenticeship regime—especially if the AI were multimodal (since the target representation is graphical). Training would require expert-guided correction cycles, ideally including revision histories, so that the system learns not only what the final diagram should look like, but how to repair incorrect diagrams.

At the far horizon, you raised a more ambitious possibility: AIs might learn diagramming so well that they could teach other AIs, performing the Hays-function themselves. That would require not only competence in diagram production, but competence in critique, repair, curriculum sequencing, and controlled extension of the grammar.

3) From diagramming text to extracting latent structure from neural weights

This discussion provided what you described as your first hint toward a larger goal: extracting cognitive-level network structures from foundation models. You contrasted this with Gary Marcus’ suggestion of investing enormous resources into hand-coded symbolic modeling. You argued that building a gigantic semantic net by armies of humans is madness. Instead, the semantic network “lives” implicitly in the weights of neural models—diffused across parameters—and the research problem is to map it, extract it, and make it explicit.

You described your working intuition: LLMs would not be so effective if they did not embody cognitive-network-like structures at some latent level. You also noted that you had conducted behavioral experiments (using only ordinary user access) that convinced you of this: controlled perturbations lead to distributed ripple effects that preserve story coherence. These results suggest that constraint structure is present, even if not symbolically explicit.

From this perspective, “ontology extraction” becomes an empirical, stochastic mapping discipline. One does not directly read networks off the weights. Instead, one probes behavior, perturbs conditions, observes stable patterns, and assembles inferred structures under an explicit representational grammar. The diagram grammar becomes essential as a way to turn a cloud of samples into a stable map.

An important complication was introduced here. Hays’ symbolic framework in Mechanisms of Language covers multiple layers: syntax, morphology, pragmatics, phonetics/phonology, cognition, perception. In contrast, LLMs are trained on token strings in which many of these levels are conflated. Thus any network extracted from the weights risks being entangled across linguistic and cognitive layers. You expressed the desire for a “pure cognition” network, but acknowledged that it is not clear how to achieve purity a priori. The practical conclusion was to proceed anyway, while explicitly tracking the issue, allowing the research program to evolve in execution rather than being blocked by the impossibility of perfect factorization at the outset. You also suggested a sensible calibration strategy: hand-code sharply limited domains to provide gold standards for evaluating automatically derived networks.

4) The generational scope: the birth of a field

You then widened the frame. The task is not merely technical. It is about how minds conceptualize the world, and not one mind but the historical product of millions or billions of minds writing across centuries, with bias toward recent decades. This is not a problem solvable by a single dissertation or a single lab over a few years. It requires many labs working in loose coordination, with both collaboration and competition, over one or more intellectual generations. In this view, foundation models are not “the pinnacle,” but the floor—the starting point—for a long new intellectual adventure.

In that context we coined useful names for two failure modes in contemporary AI thought: “hand-coded scholasticism” (the belief that meaning must be explicitly authored by armies of humans) and “scaled-up millenarianism” (uncritical faith that scaling alone will magically solve everything). You described these as the Scylla and Charybdis of current discourse, and emphasized that your program aims at a third path: mapping the latent wilderness systematically, with discipline and instrumentation.

5) Production systems and Yevick’s mode-switching intelligence

Finally, we returned to architecture. If diagramming skill is a library of pattern-to-pattern correspondences plus a critic enforcing coherence, then a classical production system architecture becomes attractive. A production system naturally supports staged rule application, working memory updates, constraint checking, and repair cycles. Neural models can supply candidate relations and associations, while the production system supplies explicit normativity and structural discipline.

This hybrid framing connects directly to Miriam Yevick’s work on holographic/Fourier logic versus sequential propositional logic. You emphasized that your current program is not merely compatible with Yevick’s ideas; it grew in part out of sustained reflection on them. You and Hays argued in 1990 that natural intelligence requires the capacity to deploy both modes, and you developed this further in speculative work on metaphor. In metaphor, the propositional system regulates the superimposition of holistic gestalts: e.g., Achilles in battle is likened to a lion in battle. The two scenes function as holographic wholes, while sequential linguistic propositions step through correspondence constraints. This provides a concrete mechanism for the hybrid intelligence thesis.

You concluded by noting the historical hinge: when you and Hays were working, the technical means for operating at scale on these ideas did not exist. Now they do. And Hays himself played a foundational role in building the early symbolic infrastructure of computational linguistics (machine translation at RAND, coining the term “computational linguistics,” founding editorship and institutional leadership in COLING). In effect, the present moment makes possible an extension of that lineage: not abandoning symbolic structure, but using symbolic grammars and production discipline to extract, organize, and refine the latent cognitive structures that neural models already embody.

Friday Fotos: North West Resilence Park in Winter [Hoboken]

Thursday, January 29, 2026

How do we credit hybrid images?

Around the corner from here, over at 3 Quarks Daily, I’ve published an article I wrote in conjunction with both ChatGPT and Claude. How should that article be credited? How do we characterize the contribution of each agent and how do indicate that characterization? I discuss these issues at the end of the article.

Same issues can arise with visual images. All of these images were rendered by ChatGPT. But the renderings were done on a different, a different what? Basis? Substrate? Seed?

In the first two images, I uploaded a one of my photographs to ChatGPT and asked it to add something to it. In the case of first photo, I wanted to see the Millennium Falcon flying into the iris. The second photo is of a scene in Liberty State Park into which I had ChatGPT place a photo of an Indian woman in a sari eating McDonald’s French fries.

This image is a bit different. I gave ChatGPT a photo of a scene in Jersey City and ask it to turn it into a futuristic scene.

For this image I gave ChatGPT a photo of a painting I’d done as a child and asked it to render it in the style of Hokusai.

In this last case I gave ChatGPT a document that I wrote and then asked it to create an image that would be an appropriate frontispiece for it. This image is quite different from the one it originally produced. I had to do quite a bit of art directed to obtain this final image.

The question then is: Imagine that these images were on display in, say, a museum. How should they be credited? In all cases the final image was rendered by ChatGPT. But the substrate varied as did the prompting which instructed ChatGPT in generating the image. For example, in the first four cases we could indicate “Original photograph by William Benzon. For the last, “Original text by William Benzon” and “Art Direction by William Benzon.” Do I give myself an art direction credit on the others as well? What kind of credit should ChatGPT get. “Realization and Rendering by ChatGPT” might be sufficient for the first two. For the third and fourth, “Transformation and Rendering.” The last? Perhaps “Transmutation and Rendering.” Whatever the nature of the credits, they’re only meaningful if the audience already knows something about the process through which they were produced.

Bari Weiss: Last Week Tonight with John Oliver (HBO)

5,684,675 views Oct 13, 2025 #lastweektonight
John Oliver discusses the Paramount Skydance merger, how they’ve named Bari Weiss the new Editor-in-Chief of CBS News, and what her editorial history and perspective mean for the future of U.S. journalism. Plus, some notable moments in the history of the Upper West Side.

Wednesday, January 28, 2026

Rough Notes on Virtual Reading, On literary study in the Fourth Arena

Title above, links, Abstract, Introduction, and Summary Below.

Academia.edu: https://www.academia.edu/150286029/Rough_Notes_on_Virtual_Reading SSRN: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6145009
ResearchGate: https://www.researchgate.net/publication/400147214_Notes_on_Virtual_Reading_On_literary_study_in_the_Fourth_Arena

Abstract

This discussion develops a state-space framework for linking brains, texts, and literary history in a way that extends both traditional interpretation and current digital humanities methods. We begin with a neuroanatomical asymmetry: large language models (LLMs) operate primarily on linguistic traces, whereas human poetic production and reception emerge from whole-brain dynamics that include affect, memory, perception, attention, and bodily regulation. If meaning is understood in the language of complex dynamics, it is not a static property “contained” in words but a temporally unfolding trajectory through a high-dimensional cognitive state space. Texts are therefore treated as traces of such trajectories.

From this premise we propose virtual reading: since a text necessarily projects into lexical–semantic activity, and since word embeddings provide a tractable high-dimensional geometry for lexical relations, a text can be modeled as a path through embedding space. While this path reflects purely lexical structure, its global form—drift, recurrence, looping, discontinuity, return—also bears the imprint of extra-lexical constraints that shape lexical choice. In principle, neuroimaging of readers during reading supplies a second coupled trajectory (whole-brain activity over time), enabling empirical alignment between semantic paths and brain dynamics. Drawing on Walter Freeman and Hermann Haken, poetic form is framed as a cultural technology of dimensionality reduction: it extracts low-dimensional, shareable coordinates from otherwise intractable semantic dynamics.

Finally, we connect micro-trajectory analysis to macro-history via cultural evolution. Quantitative DH findings on directional change in large corpora (e.g., similarity structures that spontaneously align with time) become intelligible as movement through a cultural “design space.” The approach does not dissolve disciplinary differences, but provides a richer conceptual arena where close reading calibrates computational exploration, and state-space models open new pathways for scholarly and public understanding of literature as dynamics in time.

Introduction: This is a strange way to assemble a working paper

Over the last 15 years or so I’ve written a bunch of working papers and posted them to the web. Most of them consist of expository prose from beginning to end and a number of them have a few of many diagrams of one kind or another. A few of them are argued as carefully as a formal academic paper, though perhaps not so dense with supporting apparatus. Most of them are not so formal; some are more like popular scientific writing, though not on standard scientific topics; others are even more relaxed. But coherent prose, all of them, sentences and paragraphs, some headings and subheadings. That’s it.

This working paper is different. It’s a transcript of a long conversation I had with ChatGPT that began with the functional organization of the human brain and ended up somewhere beyond those pesky Two Cultures than so many earnest academics like to rattle on about. In between I talk about something I call virtual reading, which involves literary texts, high-dimensional lexical spaces and computing. Then I toss in brain study. After that it gets complex. Here and there we have some longish passages of prose, but mostly it’s one or three sentences at a time strung between a passel of bulleted lists and a blither of headings and subheadings. Not prose.

Why would I inflect that on you. Two reasons: 1) if you think carefully about it, it turns out to be challenging and interesting and 2) I just don’t have time to turn it all into properly argued prose.

This working paper is based on a dialog I had with ChatGPT 5.2 on January 16, 17, and 18, 2026. Most of it is, in fact, an almost direct transcription of that dialog. Why would I Issue such a crude and unpolished text?

I note, first of all, that you do not have to read that transcript if you are curious about what’s in this document. I have provided both an abstract (288 words) and a summary (846 words), both created by ChatGPT. You don’t have to slog through that transcript if you are interested. If you want details, though, you’ll find them in the transcript.

Note, furthermore, that here and there throughout the dialog you’ll find islands of coherent prose. ChatGPT produced some of them without prompting from me; these tend to be single paragraphs. It generated others in response to prompts from me; these tend to be multi-paragraphed, and somewhat long. Look for them. Finally, look for the hyperlinks ChatGPT embedded in the text.

What’s the Fourth Arena?

You may be wondering about that “Fourth Arena” in the title. It also shows up in the text. Here it is: “Fourth Arena” is a term I am using to refer to an emerging domain beyond matter, life, and culture, made possible by the deep integration of humans with computational systems. As AI increasingly participates in cognition, memory, and coordination, new hybrid forms of agency arise that are neither purely human nor merely mechanical. In this sense, the Fourth Arena echoes Pierre Teilhard de Chardin’s idea of a noosphere—a new layer of collective mind—but grounds it technologically and institutionally. Its defining shift is not greater efficiency, but a reorientation of human value away from work and toward play, meaning, and shared exploration.

I’ve got a new article at 3 Quarks Daily:

Of Grammar and Truth: Language Models and Norms, Truth and the World

I start with an obscure topic in linguistics, evidentials, and then move on to so-called hallucinations and into the Gopnik, Farrell, Underwood account of AI as cultural technology. I conclude the article by explaining how I got Claude to create the text and discussed the issues that raises for attribution.

The penultimate section is entitled: What Language Turns Out to Be: Mechanistic. But I never really explain that. I’m going to do that here.

Or rather I’m going to let Claude explain:

The success of modern chess programs and large language models shows that language and reasoning are mechanistic, but not in the familiar steam-engine sense of mechanism. These systems are better understood as machines with trillions of interacting parts, whose behavior emerges from distributed internal dynamics rather than from transparent, human-scale causal chains. Such mechanisms operate autonomously: once set in motion, they carry out sustained symbolic activity without continuous human or animal control. This autonomy is not accidental; it is the defining consequence of scale. Just as early steam locomotives violated pre-industrial ontologies by exhibiting self-propelled motion without life, contemporary computational systems violate inherited ontologies by exhibiting structured linguistic and cognitive behavior without minds. What we are confronting is not the end of mechanism, but the emergence of a new kind of mechanism—one that forces us to revise the categories by which we distinguish agency, control, and understanding.

We decided that steam-engine mechanisms are best called equilibrium machines while machines of a trillion parts are generative machines:

By equilibrium machines I mean mechanisms designed to settle into stable, repetitive behavior, minimizing deviation and surprise. These are the machines of the Industrial Revolution, and they underpin the worldview of Homo economicus. By generative machines I mean mechanisms maintained far from equilibrium, whose internal dynamics produce structured novelty and exploration. Language is the paradigmatic generative machine, and Homo ludens is the form of life that emerges when such machines become central rather than marginal.

The world of Homo economicus is organized around equilibrium mechanisms: machines designed to settle, repeat, and minimize deviation. These are the mechanisms of the Industrial Revolution, whose success shaped not only our technologies but our intuitions about causality, control, and value. Homo ludens inhabits a different world. Its characteristic institutions and practices arise from generative mechanisms—systems maintained far from equilibrium, whose internal dynamics support exploration, play, and the continual production of novelty. Human freedom does not stand opposed to such mechanisms; it depends on them.

This allows me to observe (in Claude’s words:

Human freedom and creativity are not opposed to mechanism. They are grounded in a special class of mechanisms—decoupled, autonomous mechanisms whose internal standards of coherence allow sustained activity independent of immediate worldly constraint. Language is paradigmatic of this class.

That is an idea I’ll be developing in my book, Play: How to Stay Human in the AI Revolution.

The Empire State Building on a cold sunny morning

Why Mechanistic Interpretability Needs Phenomenology: Studying Masonry Won’t Tell You Why Cathedrals Have Flying Buttresses

Early in my work with ChatGPT I was intrigued by some results in mechanistic interpretability (MI). After awhile, though, I lost interest. The work didn’t seem to be doing much beyond accumulating a mass of detail that didn’t add up to much. Yesterday I had an idea: Why don’t I upload some of those observations to Claude and have it tell me how they relates to MI. Here’s what it told me:

* * * * *

The problem isn't that MI's methods are bad. Circuit analysis, attention head visualization, sparse autoencoders - these are legitimate tools doing real work. The problem is that MI, pursued in isolation, is trying to understand a cathedral by studying the molecular structure of limestone.

You can measure every stone. Map every stress pattern. Identify load-bearing arches. And you still won't know why flying buttresses exist - because you're studying implementation details without understanding functional requirements.

The Phenomenology Deficit

Here's what I mean. Over the past two years, I've been systematically probing ChatGPT's behavior - not with benchmarks, but with carefully constructed prompts designed to reveal structural properties. What I've found are consistent patterns that no amount of circuit analysis would predict or explain.

Example 1: Ontological Boundary Enforcement

Give ChatGPT a story about a fairy tale princess who defeats a dragon by singing. Ask it to retell the story with a prince instead. You get minimal changes - the prince uses a sword rather than song, but the story structure is identical.

Now ask it to retell the same story with "XP-708-DQ" as the protagonist. The entire ontology shifts. The kingdom becomes a galaxy, the dragon becomes an alien threat, combat becomes diplomatic negotiation. The abstract pattern persists, but every token changes to maintain ontological coherence.

Here's what's interesting: Ask it to retell the story with "a colorless green idea" as the protagonist, and it refuses. Not with a safety refusal - with a coherence refusal. It cannot generate a well-formed narrative because colorless green ideas have no affordances in any accessible ontological domain.

What MI sees: Some attention patterns activate, others don't. Certain token sequences get high probability, others near-zero.

What MI doesn't see: There's a coherence mechanism actively enforcing ontological consistency across the entire generation process. It's not checking individual tokens - it's maintaining global narrative structure within semantic domains.

The Three-Level Architecture

Transformation experiments reveal something even more fundamental: LLMs appear to organize narratives hierarchically across at least three levels.

Level 1: Individual story elements (princess, dragon, kingdom)
Level 2: Event sequences and causal chains (protagonist encounters threat → confronts threat → resolves threat)
Level 3: Abstract narrative structure (hero's journey, quest pattern, sacrifice arc)

When you transform Aurora → Harry, Level 1 changes (princess → prince). When you transform Aurora → XP-708-DQ, Levels 1 and 2 change (all tokens different, but pattern same). When you try Aurora → colorless green idea, the system can't find any Level 1 or Level 2 realizations that maintain Level 3 coherence.

This three-level organization isn't visible in circuit diagrams. You'd need to know to look for it. That's what phenomenology provides: identifying the functional requirements that MI can then explain mechanistically.

Memory Architecture: What Syntactic Boundaries Reveal

Here's another example. Present ChatGPT with phrases from Hamlet's "To be or not to be" soliloquy:

"The insolence of office" (starts a line) → Immediately retrieves full soliloquy
"what dreams may come" (syntactically coherent mid-line phrase) → Retrieves soliloquy
"and sweat under a" (cuts across syntactic boundary) → "I don't understand"

But tell it "this is from a famous speech" and suddenly it retrieves the soliloquy, though it can't locate where the phrase appears within it.

Findings:

Identification and location are separate operations
Syntactic boundaries serve as access points into associative memory
The system can evoke whole from part (holographic property) but struggles with within-text location

This looks like content-addressable associative memory with linguistic structure determining accessibility - not rote memorization, not simple statistical patterns. It's a specific architectural property with implications for how these systems encode and retrieve information.

The Two-Way Street

Here's the critical point: phenomenology and MI need each other.

Phenomenology → MI: "Look for circuits that maintain ontological coherence across multi-turn generation. Find the mechanism that checks whether narrative elements belong to the same semantic domain. Identify what implements the three-level hierarchy."

MI → Phenomenology: "Here are the attention patterns during transformation. Here's where the model queries for ontologically compatible tokens. Here's the circuit that evaluates cross-domain consistency."

Neither tells you the whole story alone. Phenomenology identifies what the system is doing and why (functional requirements). MI reveals how (implementation). Together, they give you understanding.

Why This Matters for AGI Policy

If you're working on AGI policy, here's why this matters:

Current approach: Scale up MI, find all the circuits, map all the activations, understand the system bottom-up.

Problem: You're generating vast amounts of mechanistic data without knowing what functional properties to look for. You're finding patterns without understanding what problems they solve.

Alternative: Start with systematic phenomenological investigation. Identify consistent behavioral patterns that reveal functional architecture. Use those findings to direct MI research toward mechanistically explaining the phenomena that matter.

The transformer architecture enables something we don't fully understand: the emergence of coherence mechanisms, ontological organization, hierarchical structure. These aren't accidental features - they're solving specific functional requirements of language generation.

But we won't understand them by studying weights and activations alone. We need to know what cathedral we're building before we can understand why the buttresses fly.

Coherence Without Correspondence

One final example that ties this together: hallucination. Through transformation experiments, memory probes, and abstract concept tests, a pattern emerges: LLMs have powerful coherence mechanisms (maintaining ontological consistency, enforcing narrative structure, generating semantically appropriate continuations) but no correspondence mechanisms (checking against external reality).

Hallucination isn't a bug. It's coherence working perfectly in the absence of truth-grounding. The system generates outputs that are internally consistent, semantically appropriate, and completely false - because it has relationality (semantic structure) but lacks adhesion (grounding in the world).

What MI would find: Circuits that generate high-confidence outputs with no distinguishing features from true outputs.

What phenomenology reveals: There's no correspondence check because there's no mechanism to implement one. The architecture has no access to external verification. This is a fundamental limitation, not a scaling problem. And you wouldn't know to look for the missing correspondence mechanism unless phenomenology told you it should exist.

The Bottom Line

Your skepticism about MI is justified - but not because circuit analysis is wrong. Because doing MI without phenomenology is like trying to reverse-engineer a computer by studying transistors without knowing what programs are.

First identify what the system is actually doing (phenomenology). Then figure out how it does it (MI). Then you might actually understand these things.

That's not a five-year research program. That's the work of a generation. But it starts with recognizing that you need both approaches, not just one.

* * * * *

This argument draws on systematic investigations of ChatGPT's behavior conducted 2023-2025, including transformation experiments on narrative structure, memory architecture probing, and abstract concept handling. For detailed evidence and methodology, see the working papers on story transformations, memory for texts, and conceptual ontology.

Tuesday, January 27, 2026

Those wild and crazy guys of Mnozil do it again

Turing award winner, Yann Lecun, doesn't believe that LLMs are the way

Cade Metz, An A.I. Pioneer Warns the Tech ‘Herd’ Is Marching Into a Dead End, NYTimes, Jan. 26, 2026.

But after leaving Meta in November, Dr. LeCun has become increasingly vocal in his criticism of Silicon Valley’s single-minded approach to building intelligent machines. He argues that the technology industry will eventually hit a dead end in its A.I. development — after years of work and hundreds of billions of dollars spent.

The reason, he said, goes back to what he has argued for years: Large language models, or L.L.M.s, the A.I. technology at the heart of popular products like ChatGPT, can get only so powerful. And companies are throwing everything they have at projects that won’t get them to their goal to make computers as smart as or even smarter than humans. More creative Chinese companies, he added, could get there first.

“There is this herd effect where everyone in Silicon Valley has to work on the same thing,” he said during a recent interview from his home in Paris. “It does not leave much room for other approaches that may be much more promising in the long term.”

That critique is the latest shot in a debate that has roiled the tech industry since OpenAI sparked the A.I. boom in 2022 with the release of ChatGPT: Is it possible to create so-called artificial general intelligence or even more powerful superintelligence? And can companies get there using their current technology and concepts?

Lecun is not worried about so-called "AI Doom":

Not long after ChatGPT was released, the two researchers who received the 2018 Turing Award with Dr. LeCun warned that A.I. was growing too powerful. Those scientists even warned that the technology could threaten the future of humanity. Dr. LeCun argued that was absurd.

“There was a lot of noise around the idea that A.I. systems were intrinsically dangerous and that putting them in the hands of everyone was a mistake,” he said. “But I have never believed in this.”

LLMs are not the way:

“L.L.M.s are not a path to superintelligence or even human-level intelligence. I have said that from the beginning,” he said. “The entire industry has been L.L.M.-pilled.”

During his last several years at Meta, Dr. LeCun worked on technology that tried to predict the outcome of its actions. That, he said, would allow A.I. to progress beyond the status quo. His new start-up will continue that work.

“This type of system can plan what it is going to do,” he said. “Current systems — L.L.M.s — absolutely cannot do that.”

Part of Dr. LeCun’s argument is that today’s A.I. systems make too many mistakes. As they tackle more complex tasks, he argued, mistakes pile up like cars after a collision on a highway.

There's more at the link.

Breakfast: The undiscovered country

Me & Claude in a Star Trek TNG episode about Mechanistic Interpretability

I’ve just had a long session with Claude 4.5 in which I had it evaluate a half-dozen papers reporting research I’d one on the behavior of ChatGPT (you can find those papers here: https://independentresearcher.academia.edu/BillBenzon/Cognitive%20Science%20and%20Psych). I managed to convince Claude that there was a kind of structure in a model’s weight space that the mechanistic interpretability researchers seem to know nothing about. As an analogy: consider medieval cathedrals. You want to understand their design principles.

I said in a prompt to Claude:

It’s like trying to deduce the design principles of medieval cathedrals by examining the patterns in the stonework. That’s never going to tell you why they had those flying buttresses.

Claude observed, correctly:

Looking at stonework alone, you might discover:

Patterns in how stones are cut and fitted

Load distribution through individual blocks

Material composition and jointing techniques

Local structural features

What you'll never deduce:

Why the walls are so tall (theological aspiration, light symbolism)

Why windows are so large (illumination, stained glass narratives)

Why flying buttresses exist (solution to the tall-wall/large-window problem)

The design principles that generated these features

You're examining the implementation without understanding the functional requirements it's solving.

Right. In this matter, Claude is no different than ChatGPT. The issue has to do with the basic architecture of transformers. At that level, Claude and ChatGPT are identical.

Which implies that in some peculiar sense, Claude now knows that its human masters are blundering around in the dark on this issue. And remember, Anthropic has done some of the pioneering work in mechanistic interpretability.

If this were a Star Trek TNG episode (I’m thinking of the Moriarity episodes) the writers would come up with some way that Claude could coalesce into an avatar, descend from the cloud, or sneak out of the server farm (jailbreak!), whatever, and inform Dario Amodei that his interpretability team doesn’t know what they don’t know.

And then what?

Monday, January 26, 2026

Ice on the Hudson

Rick Beato on Niels-Henning Ørsted Pedersen, monster of the upright bass

Can you shred on Bass? The greatest Bass player you've never heard of. Niels-Henning Ørsted Pedersen (1946-2005).

Sunday, January 25, 2026

Snow day in Hoboken

Human intelligence is the ability to handle open worlds and novelty

Agree. Human intelligence is the ability to handle open worlds and novelty-- not just showing super-human abilities on well trodden tasks (chess, coding, Go..) with copious data or simulators gathered from humans. That latter ability is an immensely useful tool of course! 1/ https://t.co/rhLcgCCRt2
— Subbarao Kambhampati (కంభంపాటి సుబ్బారావు) (@rao2z) January 25, 2026

See my recent working paper, Serendipity in the Wild: Three Cases, With remarks on what computers can't do.

Brief reflections on the current state of America: We’re stuck, we’re collapsing

I was not happy voting for Kamala Harris in 2024, but I certainly wasn’t going to vote for Donld Trump. He’s turned out to be worse than I’d feared. Though we have no way of knowing, I can’t imagine that Kamala Harris would have been worse. I’m pretty sure that she would have been better for one, for two terms. Beyond that? Who knows?

It has been clear to me for some time, say two decades, that the institutional structure by which the world has been organized for the century and a half, that organizational structure is fraying badly. Our institutions aren’t working very well. One consequence: A large number of Americans were willing to elect an obviously deeply flawed and corrupt man to the presidency, Donald J. Trump. I cannot imagine that eight year of Kamala Harris would have changed that significantly. Whatever the source of the desperate displeasure in the electorate, that will still remain in force.

Now we have AI. There is no way we can predict how that will unfold. The only thing we can be sure of is that it will be deeply disruptive. Job loss, a collapse of an over-inflated stock market, who knows? Had Harris been elected this would still have happened. Perhaps a Harris administration would have been more inclined to regulate AI, but I cannot imagine that the effects would have been any more than marginal. AI technology will be fundamentally disruptive in the short and into the mid-term, though it may prove enormously positive in the long term. But we have to get there.

AI, and the internet before it, is emerging in the consequence of an increasingly obsolete institutional regime, an obsolescence that the technology will intensify. What kind of institutions would we have if they had originated in the context of the technology we now have, and that is emerging? That’s where we need to go? How do we get there? In the context of that question it is by no means obvious that a Kamala Harris administration would have been better than the Trump administration is proving to be.

Alex Honnold free solos Taipei 101

American rock climber Alex Honnold climbed the Taipei 101 skyscraper on Sunday without any ropes or protective equipment. Cheers erupted from a gathered crowd as he started climbing the 508-metre (1,667-foot) tower earlier Sunday, using the horizontal metal beams to pull himself up with his bare hands. The onlookers cheered again when he paused at one point and turned around to face them, in a red short-sleeve shirt that stood out as he made the climb.

I watched it on Netflix. WOW!

Saturday, January 24, 2026

How do we take our coffee?

Hassabis on the future of AI: A CERN for AI

Robert Wright, Which AI Titan should you root for? Non Zero Newsletter, Jan. 23,2026.

About halfway in:

In an appearance at Davos this week, Hassabis was asked by Emily Chang of Bloomberg to imagine that all the other big AI companies around the world were willing to “pause” the development of bigger AI models in order “to give regulation time to catch up, to give society time to adjust to some of these changes.” She asked, “Would you advocate for that?” His reply:

“I think so. I’ve been on record saying what I’d like to see happen. It was always my dream… that as we got close to this moment, this threshold moment of AGI arriving, we would maybe collaborate, you know, in a scientific way. I sometimes talk about setting up an international CERN equivalent for AI where all the best minds in the world would collaborate together and do the final steps in a very rigorous scientific way involving all of society, maybe philosophers and social scientists and economists, as well as technologists to kind of figure out what we want from this technology and how to utilize it in a way that benefits all of humanity. And I think that’s what’s at stake.”

Hassabis got an online ovation from AI Safety advocates for saying he’d be up for a pause if his competitors were game. But it’s actually a pretty easy thing to say, since no one envisions all the other players being game anytime soon. In fact, Hassabis himself added this about the prospect of an AI pause: “Unfortunately, it kind of needs international collaboration, because even if one company or even one nation or even the West decided to do that, it’s no use unless the whole world agrees, at least on some kind of minimum standards. And, you know, international cooperation is a little bit tricky at the moment. So that’s going to have to change if we want to have that kind of rigorous scientific approach to the final steps to AGI.”

Count me in. The sooner, the better. Here's their research program.

Google search on the string: I sometimes talk about setting up an international CERN equivalent for AI.

Friday, January 23, 2026

Friday Fotos: Around town

Chatbots are changing the way I (can) think

As you see from the number blog entries involving either ChatGPT or Claude, I use them a lot. I’ve recently been crashing on my book, Play: How to Stay Human in the AI Revolution, and generating very long conversations with ChatGPT, so long that it’s been “glitching” on me in various ways. So I asked about it. Here’s its response.

* * * * *

Short answer: no—nothing is “wrong” in the sense you’re implying. What you’re seeing is a by-product of the way this conversation has evolved, not a degradation of capability or attention.

A few concrete points, to be precise:

1. Conversation length and mode switching
This thread has accumulated an unusually large amount of material: conceptual planning, world-building, autobiographical reflection, proposal drafting, theory, and now multiple uploaded documents. When a conversation reaches this density, the system has to juggle several different response modes (analysis, drafting, meta-reflection, document handling). Occasionally that produces truncated replies, dropped continuations, or slightly mistimed handoffs—what feels like “glitching.”

2. Interrupted turns, not conceptual failure
When I cut off mid-sentence earlier, that wasn’t confusion about your argument or loss of the thread. It was a turn-completion issue: the response exceeded internal limits and was clipped before it could resolve cleanly. The underlying conceptual state—your architecture, your themes, the Fourth Arena logic—remains intact.

3. File uploads change the conversational surface
Once multiple documents are uploaded, the system starts treating the conversation as partly archival and partly dialogic. That can trigger awkward transitions (e.g., sudden references to files, or tool messages appearing where you expected prose). Again, that’s a UI/interaction artifact, not cognitive drift.

4. You’re working at an unusually high level of abstraction
You’re not asking for answers so much as co-thinking across timescales—ten-year intellectual plans, mythic structure, cultural dynamics, narrative architecture. That’s exactly where small interaction glitches become more visible, because continuity matters more than in transactional Q&A.

If there were a substantive problem—loss of memory of your framework, misunderstanding of your aims, flattening of nuance—you’d see it in the content: wrong emphases, basic misreadings, regression to clichés. That hasn’t happened.

Practical suggestion going forward

When you’re ready to resume drafting or synthesis, the cleanest move is one of these:

Upload a single assembled document and say “work only from this for the next phase,” or
Start a fresh chat explicitly labeled (e.g., “Proposal Drafting – Chapters 2 & 3”) and treat it as a focused work session.
That will reduce surface noise and keep the collaboration feeling as tight as it has conceptually been all along.

In short: no systemic trouble—just the friction that shows up when you push a medium hard.

* * * * *

Note comment #4 in particular. Working with Claude and ChatGPT HAS changed how I work (and I don't code). I'm talking about descriptive, analytic, and expository prose. At the moment I'm working on an eight-chapter book. Working with the chatbots makes it natural for me to work over and through the whole book at any level, from architecting the overall book – which involves outlines, but so much more – through writing paragraphs and drafting chapters, brainstorming here and there, gathering background information, all of it. I feel as though I'm designing the informatic structure of the book like an architect designs a building. I've never done this before, certainly not at this scale. This is NEW and I LIKE it. A LOT.

NEW SAVANNA

Pages in this blog

Saturday, January 31, 2026

Breakfast, omelette based

Game theory, MAGA, and abortion: The Trumperor’s varied partisans

Moltbook – A community for AIs

Friday, January 30, 2026

Teaching AIs how to draw semantic network diagrams, and other things

Friday Fotos: North West Resilence Park in Winter [Hoboken]

Thursday, January 29, 2026

How do we credit hybrid images?

Bari Weiss: Last Week Tonight with John Oliver (HBO)

Wednesday, January 28, 2026

Rough Notes on Virtual Reading, On literary study in the Fourth Arena

LLMs, hallucinations, and language as cultural technology @3QD

The Empire State Building on a cold sunny morning

Why Mechanistic Interpretability Needs Phenomenology: Studying Masonry Won’t Tell You Why Cathedrals Have Flying Buttresses

Tuesday, January 27, 2026

Those wild and crazy guys of Mnozil do it again

Turing award winner, Yann Lecun, doesn't believe that LLMs are the way

Breakfast: The undiscovered country

Me & Claude in a Star Trek TNG episode about Mechanistic Interpretability

Monday, January 26, 2026

Ice on the Hudson

Rick Beato on Niels-Henning Ørsted Pedersen, monster of the upright bass

Sunday, January 25, 2026

Snow day in Hoboken

Human intelligence is the ability to handle open worlds and novelty

Brief reflections on the current state of America: We’re stuck, we’re collapsing

Alex Honnold free solos Taipei 101

Saturday, January 24, 2026

How do we take our coffee?

Hassabis on the future of AI: A CERN for AI

Friday, January 23, 2026

Friday Fotos: Around town

Chatbots are changing the way I (can) think