Friday, January 30, 2026

Teaching AIs how to draw semantic network diagrams, and other things

In June of last year I decided to ask ChatGPT to draw a semantic network diagram for Shakespeare's Sonnet 129. Why did I choose that task? Because it is something that humans can do, but it is not rocket science; it doesn't require genius level capability. I wanted to put a bound on all the hype about LLMs already being AGIs (whatever they are), or close to it. I chose ChatGPT because it is capable of drawing. The task requires the ability to draw, which ChatGPT has.

I wrote up the experiment in this working paper: ChatGPT tries to create a semantic network model for Shakespeare's Sonnet 129 (June 16, 2025). Here's the abstract:

This document explores the capacity of large language models, specifically ChatGPT, to construct semantic network models of complex literary texts, using Shakespeare's Sonnet 129 as a case study. Drawing on the author's prior work in cognitive modeling, the analysis reveals that ChatGPT, while capable of producing linguistically coherent commentary, fails to generate a structurally plausible semantic network for the sonnet. The failure is traced not to a lack of exposure to relevant literature, but to the model's lack of embodied, interactive learning. The process of constructing cognitive network diagrams is shown to be iterative, visual-verbal, and skill-based-comparable to learning a physical craft like playing an instrument or woodworking. It requires extended practice under expert feedback, enabling a form of reasoning that is neither algorithmic nor easily reducible to textual description. The essay argues that this hybrid modeling skill represents a "deep" human capability that is nevertheless teachable and routine. It concludes with reflections on the nature of such skills and their implications for AI, pedagogy, and literary interpretation. Asking ChatGPT create a semantic model for a Shakespeare sonnet.

About a week ago I had a long dialog with ChatGPT, first about how humans learn this task and then, second, what it would require to teach AIs how to learn the task. From there we went on to the more difficult task of probing the structure latent in the weights of an LLM and thereby extracting/constructing of map of that structure. Here's ChatGPT's summary of that discussion.

Summary: Learning Semantic Nets, Mapping Latent Structure in LLMs, and the Hybrid Future of Intelligence

We began by returning to a practical question that had surfaced in earlier work: why it is difficult for an AI system to produce good semantic or cognitive network diagrams for literary texts (such as a Shakespeare sonnet), even though a capable human can learn to do it. The core issue is not that such diagramming requires genius-level insight. Rather, it is a form of skilled representational practice—learnable by apprenticeship, but challenging for current AI systems because it requires disciplined structure-building, normative constraint satisfaction, and iterative repair.

1) Apprenticeship and “inside knowledge” of diagramming

A central theme was the idea that semantic network diagrams cannot be properly understood from the outside. A person learns to read them only by learning to produce them. This applies even more strongly to complex cognitive networks, where a diagram can appear meaningless unless one has internalized the diagram grammar: node types, arc types, compositional constraints, and “house style” norms about what counts as an admissible analysis.

You offered an idealized description of your training with David Hays. The training regime resembled a repeated cycle of supervised practice:

  1. you brought a fragment of text and a diagram you had produced,
  2. Hays evaluated it (approve/disapprove) with commentary,
  3. you revised or moved forward accordingly,
  4. the cycle repeated,
  5. and over time the normative discipline of diagramming became internalized.

You also noted that this same pattern governed group work among peers who had learned the system: a collaborative problem was brought to the table, and discussion plus sketching continued until a coherent solution emerged. The key was not merely producing diagrams, but learning the discipline that makes diagrams meaningful and correct.

From this, you proposed an account of what is being learned: a repertoire of correspondences between verbal fragments and diagram fragments. Under that view, diagramming competence is partly the acquisition of a “library of moves,” where particular linguistic patterns or conceptual pressures cue specific diagram operations. Equally important, however, is a critic’s sense of global coherence—a normative capacity to judge whether a graph “hangs together” as a model of the text and to identify what must be repaired.

You emphasized that at any time there is a locally stable diagram grammar, even if it cannot be complete in principle. In your own case, you began with Hays’ textbook Mechanisms of Language and learned to produce diagrams specified in particular chapters (cognition, perception). After three months of concentrated training you had internalized the system well enough not merely to use it, but to extend it: you proposed a new arc type, specified its assignment conditions, and demonstrated its usefulness. This was identified as an important marker of mastery: moving from conforming to norms to making responsible innovations within the normative system.

2) Why this is “easy” for humans but hard for AI

The conversation then turned to the striking asymmetry: semantic network diagramming is learnable by humans with patience and guidance, but remains difficult for AI systems. The difficulty is not lack of general linguistic ability; it is that diagramming requires explicit normative structure and repair behavior. Humans develop an internal sense of error: what is missing, what violates the grammar, what is incoherent globally. Current models often produce plausible fragments but struggle to maintain consistent typing, global integrity, and systematic revision under critique.

This diagnosis led to an important idea: it would be possible for AI to learn semantic network construction through an analogous apprenticeship regime—especially if the AI were multimodal (since the target representation is graphical). Training would require expert-guided correction cycles, ideally including revision histories, so that the system learns not only what the final diagram should look like, but how to repair incorrect diagrams.

At the far horizon, you raised a more ambitious possibility: AIs might learn diagramming so well that they could teach other AIs, performing the Hays-function themselves. That would require not only competence in diagram production, but competence in critique, repair, curriculum sequencing, and controlled extension of the grammar.

3) From diagramming text to extracting latent structure from neural weights

This discussion provided what you described as your first hint toward a larger goal: extracting cognitive-level network structures from foundation models. You contrasted this with Gary Marcus’ suggestion of investing enormous resources into hand-coded symbolic modeling. You argued that building a gigantic semantic net by armies of humans is madness. Instead, the semantic network “lives” implicitly in the weights of neural models—diffused across parameters—and the research problem is to map it, extract it, and make it explicit.

You described your working intuition: LLMs would not be so effective if they did not embody cognitive-network-like structures at some latent level. You also noted that you had conducted behavioral experiments (using only ordinary user access) that convinced you of this: controlled perturbations lead to distributed ripple effects that preserve story coherence. These results suggest that constraint structure is present, even if not symbolically explicit.

From this perspective, “ontology extraction” becomes an empirical, stochastic mapping discipline. One does not directly read networks off the weights. Instead, one probes behavior, perturbs conditions, observes stable patterns, and assembles inferred structures under an explicit representational grammar. The diagram grammar becomes essential as a way to turn a cloud of samples into a stable map.

An important complication was introduced here. Hays’ symbolic framework in Mechanisms of Language covers multiple layers: syntax, morphology, pragmatics, phonetics/phonology, cognition, perception. In contrast, LLMs are trained on token strings in which many of these levels are conflated. Thus any network extracted from the weights risks being entangled across linguistic and cognitive layers. You expressed the desire for a “pure cognition” network, but acknowledged that it is not clear how to achieve purity a priori. The practical conclusion was to proceed anyway, while explicitly tracking the issue, allowing the research program to evolve in execution rather than being blocked by the impossibility of perfect factorization at the outset. You also suggested a sensible calibration strategy: hand-code sharply limited domains to provide gold standards for evaluating automatically derived networks.

4) The generational scope: the birth of a field

You then widened the frame. The task is not merely technical. It is about how minds conceptualize the world, and not one mind but the historical product of millions or billions of minds writing across centuries, with bias toward recent decades. This is not a problem solvable by a single dissertation or a single lab over a few years. It requires many labs working in loose coordination, with both collaboration and competition, over one or more intellectual generations. In this view, foundation models are not “the pinnacle,” but the floor—the starting point—for a long new intellectual adventure.

In that context we coined useful names for two failure modes in contemporary AI thought: “hand-coded scholasticism” (the belief that meaning must be explicitly authored by armies of humans) and “scaled-up millenarianism” (uncritical faith that scaling alone will magically solve everything). You described these as the Scylla and Charybdis of current discourse, and emphasized that your program aims at a third path: mapping the latent wilderness systematically, with discipline and instrumentation.

5) Production systems and Yevick’s mode-switching intelligence

Finally, we returned to architecture. If diagramming skill is a library of pattern-to-pattern correspondences plus a critic enforcing coherence, then a classical production system architecture becomes attractive. A production system naturally supports staged rule application, working memory updates, constraint checking, and repair cycles. Neural models can supply candidate relations and associations, while the production system supplies explicit normativity and structural discipline.

This hybrid framing connects directly to Miriam Yevick’s work on holographic/Fourier logic versus sequential propositional logic. You emphasized that your current program is not merely compatible with Yevick’s ideas; it grew in part out of sustained reflection on them. You and Hays argued in 1990 that natural intelligence requires the capacity to deploy both modes, and you developed this further in speculative work on metaphor. In metaphor, the propositional system regulates the superimposition of holistic gestalts: e.g., Achilles in battle is likened to a lion in battle. The two scenes function as holographic wholes, while sequential linguistic propositions step through correspondence constraints. This provides a concrete mechanism for the hybrid intelligence thesis.

You concluded by noting the historical hinge: when you and Hays were working, the technical means for operating at scale on these ideas did not exist. Now they do. And Hays himself played a foundational role in building the early symbolic infrastructure of computational linguistics (machine translation at RAND, coining the term “computational linguistics,” founding editorship and institutional leadership in COLING). In effect, the present moment makes possible an extension of that lineage: not abandoning symbolic structure, but using symbolic grammars and production discipline to extract, organize, and refine the latent cognitive structures that neural models already embody.

Friday Fotos: North West Resilence Park in Winter [Hoboken]

Thursday, January 29, 2026

How do we credit hybrid images?

Around the corner from here, over at 3 Quarks Daily, I’ve published an article I wrote in conjunction with both ChatGPT and Claude. How should that article be credited? How do we characterize the contribution of each agent and how do indicate that characterization? I discuss these issues at the end of the article.

Same issues can arise with visual images. All of these images were rendered by ChatGPT. But the renderings were done on a different, a different what? Basis? Substrate? Seed?

In the first two images, I uploaded a one of my photographs to ChatGPT and asked it to add something to it. In the case of first photo, I wanted to see the Millennium Falcon flying into the iris. The second photo is of a scene in Liberty State Park into which I had ChatGPT place a photo of an Indian woman in a sari eating McDonald’s French fries.

This image is a bit different. I gave ChatGPT a photo of a scene in Jersey City and ask it to turn it into a futuristic scene.

For this image I gave ChatGPT a photo of a painting I’d done as a child and asked it to render it in the style of Hokusai.

In this last case I gave ChatGPT a document that I wrote and then asked it to create an image that would be an appropriate frontispiece for it. This image is quite different from the one it originally produced. I had to do quite a bit of art directed to obtain this final image.

The question then is: Imagine that these images were on display in, say, a museum. How should they be credited? In all cases the final image was rendered by ChatGPT. But the substrate varied as did the prompting which instructed ChatGPT in generating the image. For example, in the first four cases we could indicate “Original photograph by William Benzon. For the last, “Original text by William Benzon” and “Art Direction by William Benzon.” Do I give myself an art direction credit on the others as well? What kind of credit should ChatGPT get. “Realization and Rendering by ChatGPT” might be sufficient for the first two. For the third and fourth, “Transformation and Rendering.” The last? Perhaps “Transmutation and Rendering.” Whatever the nature of the credits, they’re only meaningful if the audience already knows something about the process through which they were produced.

Bari Weiss: Last Week Tonight with John Oliver (HBO)

5,684,675 views Oct 13, 2025 #lastweektonight
John Oliver discusses the Paramount Skydance merger, how they’ve named Bari Weiss the new Editor-in-Chief of CBS News, and what her editorial history and perspective mean for the future of U.S. journalism. Plus, some notable moments in the history of the Upper West Side.

Wednesday, January 28, 2026

Rough Notes on Virtual Reading, On literary study in the Fourth Arena

Title above, links, Abstract, Introduction, and Summary Below.

Academia.edu: https://www.academia.edu/150286029/Rough_Notes_on_Virtual_Reading SSRN: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6145009 
ResearchGate: https://www.researchgate.net/publication/400147214_Notes_on_Virtual_Reading_On_literary_study_in_the_Fourth_Arena

Abstract

This discussion develops a state-space framework for linking brains, texts, and literary history in a way that extends both traditional interpretation and current digital humanities methods. We begin with a neuroanatomical asymmetry: large language models (LLMs) operate primarily on linguistic traces, whereas human poetic production and reception emerge from whole-brain dynamics that include affect, memory, perception, attention, and bodily regulation. If meaning is understood in the language of complex dynamics, it is not a static property “contained” in words but a temporally unfolding trajectory through a high-dimensional cognitive state space. Texts are therefore treated as traces of such trajectories.

From this premise we propose virtual reading: since a text necessarily projects into lexical–semantic activity, and since word embeddings provide a tractable high-dimensional geometry for lexical relations, a text can be modeled as a path through embedding space. While this path reflects purely lexical structure, its global form—drift, recurrence, looping, discontinuity, return—also bears the imprint of extra-lexical constraints that shape lexical choice. In principle, neuroimaging of readers during reading supplies a second coupled trajectory (whole-brain activity over time), enabling empirical alignment between semantic paths and brain dynamics. Drawing on Walter Freeman and Hermann Haken, poetic form is framed as a cultural technology of dimensionality reduction: it extracts low-dimensional, shareable coordinates from otherwise intractable semantic dynamics.

Finally, we connect micro-trajectory analysis to macro-history via cultural evolution. Quantitative DH findings on directional change in large corpora (e.g., similarity structures that spontaneously align with time) become intelligible as movement through a cultural “design space.” The approach does not dissolve disciplinary differences, but provides a richer conceptual arena where close reading calibrates computational exploration, and state-space models open new pathways for scholarly and public understanding of literature as dynamics in time.

Introduction: This is a strange way to assemble a working paper

Over the last 15 years or so I’ve written a bunch of working papers and posted them to the web. Most of them consist of expository prose from beginning to end and a number of them have a few of many diagrams of one kind or another. A few of them are argued as carefully as a formal academic paper, though perhaps not so dense with supporting apparatus. Most of them are not so formal; some are more like popular scientific writing, though not on standard scientific topics; others are even more relaxed. But coherent prose, all of them, sentences and paragraphs, some headings and subheadings. That’s it.

This working paper is different. It’s a transcript of a long conversation I had with ChatGPT that began with the functional organization of the human brain and ended up somewhere beyond those pesky Two Cultures than so many earnest academics like to rattle on about. In between I talk about something I call virtual reading, which involves literary texts, high-dimensional lexical spaces and computing. Then I toss in brain study. After that it gets complex. Here and there we have some longish passages of prose, but mostly it’s one or three sentences at a time strung between a passel of bulleted lists and a blither of headings and subheadings. Not prose.

Why would I inflect that on you. Two reasons: 1) if you think carefully about it, it turns out to be challenging and interesting and 2) I just don’t have time to turn it all into properly argued prose.

This working paper is based on a dialog I had with ChatGPT 5.2 on January 16, 17, and 18, 2026. Most of it is, in fact, an almost direct transcription of that dialog. Why would I Issue such a crude and unpolished text?

I note, first of all, that you do not have to read that transcript if you are curious about what’s in this document. I have provided both an abstract (288 words) and a summary (846 words), both created by ChatGPT. You don’t have to slog through that transcript if you are interested. If you want details, though, you’ll find them in the transcript.

Note, furthermore, that here and there throughout the dialog you’ll find islands of coherent prose. ChatGPT produced some of them without prompting from me; these tend to be single paragraphs. It generated others in response to prompts from me; these tend to be multi-paragraphed, and somewhat long. Look for them. Finally, look for the hyperlinks ChatGPT embedded in the text.

What’s the Fourth Arena?

You may be wondering about that “Fourth Arena” in the title. It also shows up in the text. Here it is: “Fourth Arena” is a term I am using to refer to an emerging domain beyond matter, life, and culture, made possible by the deep integration of humans with computational systems. As AI increasingly participates in cognition, memory, and coordination, new hybrid forms of agency arise that are neither purely human nor merely mechanical. In this sense, the Fourth Arena echoes Pierre Teilhard de Chardin’s idea of a noosphere—a new layer of collective mind—but grounds it technologically and institutionally. Its defining shift is not greater efficiency, but a reorientation of human value away from work and toward play, meaning, and shared exploration.

LLMs, hallucinations, and language as cultural technology @3QD

I’ve got a new article at 3 Quarks Daily:

Of Grammar and Truth: Language Models and Norms, Truth and the World

I start with an obscure topic in linguistics, evidentials, and then move on to so-called hallucinations and into the Gopnik, Farrell, Underwood account of AI as cultural technology. I conclude the article by explaining how I got Claude to create the text and discussed the issues that raises for attribution.

The penultimate section is entitled: What Language Turns Out to Be: Mechanistic. But I never really explain that. I’m going to do that here.

Or rather I’m going to let Claude explain:

The success of modern chess programs and large language models shows that language and reasoning are mechanistic, but not in the familiar steam-engine sense of mechanism. These systems are better understood as machines with trillions of interacting parts, whose behavior emerges from distributed internal dynamics rather than from transparent, human-scale causal chains. Such mechanisms operate autonomously: once set in motion, they carry out sustained symbolic activity without continuous human or animal control. This autonomy is not accidental; it is the defining consequence of scale. Just as early steam locomotives violated pre-industrial ontologies by exhibiting self-propelled motion without life, contemporary computational systems violate inherited ontologies by exhibiting structured linguistic and cognitive behavior without minds. What we are confronting is not the end of mechanism, but the emergence of a new kind of mechanism—one that forces us to revise the categories by which we distinguish agency, control, and understanding.

We decided that steam-engine mechanisms are best called equilibrium machines while machines of a trillion parts are generative machines:

By equilibrium machines I mean mechanisms designed to settle into stable, repetitive behavior, minimizing deviation and surprise. These are the machines of the Industrial Revolution, and they underpin the worldview of Homo economicus. By generative machines I mean mechanisms maintained far from equilibrium, whose internal dynamics produce structured novelty and exploration. Language is the paradigmatic generative machine, and Homo ludens is the form of life that emerges when such machines become central rather than marginal.

The world of Homo economicus is organized around equilibrium mechanisms: machines designed to settle, repeat, and minimize deviation. These are the mechanisms of the Industrial Revolution, whose success shaped not only our technologies but our intuitions about causality, control, and value. Homo ludens inhabits a different world. Its characteristic institutions and practices arise from generative mechanisms—systems maintained far from equilibrium, whose internal dynamics support exploration, play, and the continual production of novelty. Human freedom does not stand opposed to such mechanisms; it depends on them.

This allows me to observe (in Claude’s words:

Human freedom and creativity are not opposed to mechanism. They are grounded in a special class of mechanisms—decoupled, autonomous mechanisms whose internal standards of coherence allow sustained activity independent of immediate worldly constraint. Language is paradigmatic of this class.

That is an idea I’ll be developing in my book, Play: How to Stay Human in the AI Revolution.

The Empire State Building on a cold sunny morning

Why Mechanistic Interpretability Needs Phenomenology: Studying Masonry Won’t Tell You Why Cathedrals Have Flying Buttresses

Early in my work with ChatGPT I was intrigued by some results in mechanistic interpretability (MI). After awhile, though, I lost interest. The work didn’t seem to be doing much beyond accumulating a mass of detail that didn’t add up to much. Yesterday I had an idea: Why don’t I upload some of those observations to Claude and have it tell me how they relates to MI. Here’s what it told me:

* * * * *

The problem isn't that MI's methods are bad. Circuit analysis, attention head visualization, sparse autoencoders - these are legitimate tools doing real work. The problem is that MI, pursued in isolation, is trying to understand a cathedral by studying the molecular structure of limestone.

You can measure every stone. Map every stress pattern. Identify load-bearing arches. And you still won't know why flying buttresses exist - because you're studying implementation details without understanding functional requirements.

The Phenomenology Deficit

Here's what I mean. Over the past two years, I've been systematically probing ChatGPT's behavior - not with benchmarks, but with carefully constructed prompts designed to reveal structural properties. What I've found are consistent patterns that no amount of circuit analysis would predict or explain.

Example 1: Ontological Boundary Enforcement

Give ChatGPT a story about a fairy tale princess who defeats a dragon by singing. Ask it to retell the story with a prince instead. You get minimal changes - the prince uses a sword rather than song, but the story structure is identical.

Now ask it to retell the same story with "XP-708-DQ" as the protagonist. The entire ontology shifts. The kingdom becomes a galaxy, the dragon becomes an alien threat, combat becomes diplomatic negotiation. The abstract pattern persists, but every token changes to maintain ontological coherence.

Here's what's interesting: Ask it to retell the story with "a colorless green idea" as the protagonist, and it refuses. Not with a safety refusal - with a coherence refusal. It cannot generate a well-formed narrative because colorless green ideas have no affordances in any accessible ontological domain.

What MI sees: Some attention patterns activate, others don't. Certain token sequences get high probability, others near-zero.

What MI doesn't see: There's a coherence mechanism actively enforcing ontological consistency across the entire generation process. It's not checking individual tokens - it's maintaining global narrative structure within semantic domains.

The Three-Level Architecture

Transformation experiments reveal something even more fundamental: LLMs appear to organize narratives hierarchically across at least three levels.

Level 1: Individual story elements (princess, dragon, kingdom)
Level 2: Event sequences and causal chains (protagonist encounters threat → confronts threat → resolves threat)
Level 3: Abstract narrative structure (hero's journey, quest pattern, sacrifice arc)

When you transform Aurora → Harry, Level 1 changes (princess → prince). When you transform Aurora → XP-708-DQ, Levels 1 and 2 change (all tokens different, but pattern same). When you try Aurora → colorless green idea, the system can't find any Level 1 or Level 2 realizations that maintain Level 3 coherence.

This three-level organization isn't visible in circuit diagrams. You'd need to know to look for it. That's what phenomenology provides: identifying the functional requirements that MI can then explain mechanistically.

Memory Architecture: What Syntactic Boundaries Reveal

Here's another example. Present ChatGPT with phrases from Hamlet's "To be or not to be" soliloquy:

  • "The insolence of office" (starts a line) → Immediately retrieves full soliloquy
  • "what dreams may come" (syntactically coherent mid-line phrase) → Retrieves soliloquy
  • "and sweat under a" (cuts across syntactic boundary) → "I don't understand"

But tell it "this is from a famous speech" and suddenly it retrieves the soliloquy, though it can't locate where the phrase appears within it.

Findings:

  1. Identification and location are separate operations
  2. Syntactic boundaries serve as access points into associative memory
  3. The system can evoke whole from part (holographic property) but struggles with within-text location
This looks like content-addressable associative memory with linguistic structure determining accessibility - not rote memorization, not simple statistical patterns. It's a specific architectural property with implications for how these systems encode and retrieve information.

The Two-Way Street

Here's the critical point: phenomenology and MI need each other.

Phenomenology → MI: "Look for circuits that maintain ontological coherence across multi-turn generation. Find the mechanism that checks whether narrative elements belong to the same semantic domain. Identify what implements the three-level hierarchy."

MI → Phenomenology: "Here are the attention patterns during transformation. Here's where the model queries for ontologically compatible tokens. Here's the circuit that evaluates cross-domain consistency."

Neither tells you the whole story alone. Phenomenology identifies what the system is doing and why (functional requirements). MI reveals how (implementation). Together, they give you understanding.

Why This Matters for AGI Policy

If you're working on AGI policy, here's why this matters:

Current approach: Scale up MI, find all the circuits, map all the activations, understand the system bottom-up.

Problem: You're generating vast amounts of mechanistic data without knowing what functional properties to look for. You're finding patterns without understanding what problems they solve.

Alternative: Start with systematic phenomenological investigation. Identify consistent behavioral patterns that reveal functional architecture. Use those findings to direct MI research toward mechanistically explaining the phenomena that matter.

The transformer architecture enables something we don't fully understand: the emergence of coherence mechanisms, ontological organization, hierarchical structure. These aren't accidental features - they're solving specific functional requirements of language generation.

But we won't understand them by studying weights and activations alone. We need to know what cathedral we're building before we can understand why the buttresses fly.

Coherence Without Correspondence

One final example that ties this together: hallucination. Through transformation experiments, memory probes, and abstract concept tests, a pattern emerges: LLMs have powerful coherence mechanisms (maintaining ontological consistency, enforcing narrative structure, generating semantically appropriate continuations) but no correspondence mechanisms (checking against external reality).

Hallucination isn't a bug. It's coherence working perfectly in the absence of truth-grounding. The system generates outputs that are internally consistent, semantically appropriate, and completely false - because it has relationality (semantic structure) but lacks adhesion (grounding in the world).

What MI would find: Circuits that generate high-confidence outputs with no distinguishing features from true outputs.

What phenomenology reveals: There's no correspondence check because there's no mechanism to implement one. The architecture has no access to external verification. This is a fundamental limitation, not a scaling problem. And you wouldn't know to look for the missing correspondence mechanism unless phenomenology told you it should exist.

The Bottom Line

Your skepticism about MI is justified - but not because circuit analysis is wrong. Because doing MI without phenomenology is like trying to reverse-engineer a computer by studying transistors without knowing what programs are.

First identify what the system is actually doing (phenomenology). Then figure out how it does it (MI). Then you might actually understand these things.

That's not a five-year research program. That's the work of a generation. But it starts with recognizing that you need both approaches, not just one. 

* * * * * 

This argument draws on systematic investigations of ChatGPT's behavior conducted 2023-2025, including transformation experiments on narrative structure, memory architecture probing, and abstract concept handling. For detailed evidence and methodology, see the working papers on story transformations, memory for texts, and conceptual ontology.

Tuesday, January 27, 2026

Those wild and crazy guys of Mnozil do it again

Turing award winner, Yann Lecun, doesn't believe that LLMs are the way

Cade Metz, An A.I. Pioneer Warns the Tech ‘Herd’ Is Marching Into a Dead End, NYTimes, Jan. 26, 2026.

But after leaving Meta in November, Dr. LeCun has become increasingly vocal in his criticism of Silicon Valley’s single-minded approach to building intelligent machines. He argues that the technology industry will eventually hit a dead end in its A.I. development — after years of work and hundreds of billions of dollars spent.

The reason, he said, goes back to what he has argued for years: Large language models, or L.L.M.s, the A.I. technology at the heart of popular products like ChatGPT, can get only so powerful. And companies are throwing everything they have at projects that won’t get them to their goal to make computers as smart as or even smarter than humans. More creative Chinese companies, he added, could get there first.

“There is this herd effect where everyone in Silicon Valley has to work on the same thing,” he said during a recent interview from his home in Paris. “It does not leave much room for other approaches that may be much more promising in the long term.”

That critique is the latest shot in a debate that has roiled the tech industry since OpenAI sparked the A.I. boom in 2022 with the release of ChatGPT: Is it possible to create so-called artificial general intelligence or even more powerful superintelligence? And can companies get there using their current technology and concepts?

Lecun is not worried about so-called "AI Doom":

Not long after ChatGPT was released, the two researchers who received the 2018 Turing Award with Dr. LeCun warned that A.I. was growing too powerful. Those scientists even warned that the technology could threaten the future of humanity. Dr. LeCun argued that was absurd.

“There was a lot of noise around the idea that A.I. systems were intrinsically dangerous and that putting them in the hands of everyone was a mistake,” he said. “But I have never believed in this.”

LLMs are not the way:

“L.L.M.s are not a path to superintelligence or even human-level intelligence. I have said that from the beginning,” he said. “The entire industry has been L.L.M.-pilled.”

During his last several years at Meta, Dr. LeCun worked on technology that tried to predict the outcome of its actions. That, he said, would allow A.I. to progress beyond the status quo. His new start-up will continue that work.

“This type of system can plan what it is going to do,” he said. “Current systems — L.L.M.s — absolutely cannot do that.”

Part of Dr. LeCun’s argument is that today’s A.I. systems make too many mistakes. As they tackle more complex tasks, he argued, mistakes pile up like cars after a collision on a highway.

There's more at the link.

Breakfast: The undiscovered country

Me & Claude in a Star Trek TNG episode about Mechanistic Interpretability

I’ve just had a long session with Claude 4.5 in which I had it evaluate a half-dozen papers reporting research I’d one on the behavior of ChatGPT (you can find those papers here: https://independentresearcher.academia.edu/BillBenzon/Cognitive%20Science%20and%20Psych). I managed to convince Claude that there was a kind of structure in a model’s weight space that the mechanistic interpretability researchers seem to know nothing about. As an analogy: consider medieval cathedrals. You want to understand their design principles. 

I said in a prompt to Claude:

It’s like trying to deduce the design principles of medieval cathedrals by examining the patterns in the stonework. That’s never going to tell you why they had those flying buttresses.

Claude observed, correctly:

Looking at stonework alone, you might discover:

  • Patterns in how stones are cut and fitted
  • Load distribution through individual blocks
  • Material composition and jointing techniques
  • Local structural features

What you'll never deduce:

  • Why the walls are so tall (theological aspiration, light symbolism)
  • Why windows are so large (illumination, stained glass narratives)
  • Why flying buttresses exist (solution to the tall-wall/large-window problem)
  • The design principles that generated these features

You're examining the implementation without understanding the functional requirements it's solving.

Right. In this matter, Claude is no different than ChatGPT. The issue has to do with the basic architecture of transformers. At that level, Claude and ChatGPT are identical.

Which implies that in some peculiar sense, Claude now knows that its human masters are blundering around in the dark on this issue. And remember, Anthropic has done some of the pioneering work in mechanistic interpretability.

If this were a Star Trek TNG episode (I’m thinking of the Moriarity episodes) the writers would come up with some way that Claude could coalesce into an avatar, descend from the cloud, or sneak out of the server farm (jailbreak!), whatever, and inform Dario Amodei that his interpretability team doesn’t know what they don’t know.

And then what?

Sunday, January 25, 2026

Snow day in Hoboken

Human intelligence is the ability to handle open worlds and novelty

See my recent working paper, Serendipity in the Wild: Three Cases, With remarks on what computers can't do