Monday, January 12, 2026

AI for the Next 30 Years: Four kinds of activity that should be pursued

Here’s the prompt I used to elicit a text from Claude:

I want to do a blog post setting forth those four programs. I want it to be between, say, 1000 and 2000 words, no more. It should have an introduction, sections for each of the four programs, and some final remarks. Give it a title like: AI for the next 30 years, an open-ended plan. Assume a college-educated readership that is generally sophisticated. I’m going to put it in my New Savanna blog along with another post in which I present excerpts from Rodney Brooks’ current remarks on technology.

I’ve spent a lot of time over the last three days conceptualizing those four programs, interacting with both ChatGPT 5.2 and Claude 4.5. Those chats, in turn, rest on work that I’ve done with both chatbots over the last three years. Moreover, I have uploaded a fair number of documents to those chatbots, both articles from the formal literature and informal working papers, going back five decades.

Note that AGI is not mentioned anywhere nor did I ask ChatGPT to make dated prediction. Predicting where the earth will be in the solar system in 30 years, that’s easy. We’ve known how to do that since Newton. Predicting the weather 30 years out, very difficult to impossible. But maybe we can come up with rough estimates of average temperature for the year, and precipitation. Predicting the 30-year evolution of a complex socio-cultural-technical system? Not on your life.

I’ve edited Claude’s text in some minor ways and added so links at the end of each section.

AI for the Next 30 Years

Large Language Models represent something fundamentally new in computing: systems that have learned vast amounts about the world but encode that knowledge implicitly, in billions of inscrutable parameters. We can use these systems—often impressively—but we don't truly understand what they know or how they organize that knowledge. It's as if we've discovered a vast wilderness: we can explore parts of it, but we lack comprehensive maps.

Over the past few years, I've been thinking about what it would take to map this territory systematically and transform it from mysterious wilderness into reliable infrastructure. This thinking has crystallized into four parallel research programs, each essential, each reinforcing the others. Unlike the prevailing Silicon Valley vision of one lab developing a superintelligent system that does everything, this is a distributed, collaborative, multi-decade effort requiring both technical innovation and institutional creativity.

Activity 1: Ontology Extraction

The challenge: LLMs generates texts that distinguish between dogs and cats, animate and inanimate, concrete and abstract—but this knowledge exists only implicitly in weight matrices. We need to extract this latent ontological structure and make it explicit and inspectable.

Recent work by Christopher Manning and colleagues at Stanford has shown that neural networks encode rich linguistic structure—syntax trees, for instance—that can be extracted through systematic probing. I'm proposing we extend these methods from linguistic structure to ontological structure: the categories, hierarchies, and affordances that organize conceptual knowledge.

The key insight is that ontology is implicit in syntax. Verbs select for certain kinds of subjects and objects based on categorical presuppositions. "Eat" requires an animate agent and edible patient. These selectional restrictions reveal the categorical structure underneath. By systematically probing syntactic behavior, clustering words by shared patterns, and validating through transformation tests, we can extract the ontologies LLMs have learned.

This work must be distributed across many research groups, each focusing on specific domains—medical ontologies, legal ontologies, physical systems ontologies, and so forth. No single lab has the expertise or resources to map the entire territory. We need shared infrastructure (probing tools, ontology repositories, validation benchmarks) and coordinated standards, but the actual extraction work happens in specialized communities with deep domain knowledge.

The payoff: explicit ontological structure that can be verified, debugged, systematically improved, and integrated with symbolic reasoning systems. We transform opaque neural networks into hybrid systems that combine learning with legible structure.

Some background:

Christopher Manning et al. Emergent linguistic structure in artificial neural networks trained by self-supervision, PNAS 2020, https://www.pnas.org/doi/full/10.1073/pnas.1907367117

William Benzon, ChatGPT: Exploring the Digital Wilderness, Findings and Prospects, https://www.academia.edu/127386640/ChatGPT_Exploring_the_Digital_Wilderness_Findings_and_Prospects (see especially pp. 28-38, 42-44]

Activity 2: Cognitive Models and Multimodal Grounding

The challenge: Extracting ontologies from language gives us how language talks about the world, not how minds represent the world for perception and action. A robot needs more than linguistic categories—it needs grounded representations that integrate vision, touch, motor control, and yes, language, into a unified cognitive model. This distinction is standard in the cognitive sciences, including “classical” symbolic AI. I picked it up in the work I did with David Hays in the 1970s on cognitive networks for natural language semantics. We conceived of language mechanisms as operating on a separate cognitive model—language is an interface to the model, not the container of it. For embodied AI and robotics, this becomes crucial.

Consider a cup. The linguistic ontology tells us: cup is-a container, is-a artifact, can-hold liquids. The cognitive model adds: cylindrical shape with hollow interior, graspable via handle, stable on flat surfaces, rigid, will break if dropped, liquid spills if tilted beyond 45 degrees. This is sensorimotor knowledge grounded in perception and action, not purely linguistic.

Current multimodal systems (like GPT-4V or Gemini) take vision and "linguistify" it—everything gets processed through language. What we need are systems where multiple modalities read and write to a common cognitive model. Vision contributes spatial structure, language contributes categorical relationships, action contributes causal understanding, and they all integrate.

This research connects directly to robotics. A robot exploring a new kitchen needs to build spatial maps, identify affordances, understand causal relationships (that knob controls that burner), and eventually respond to linguistic commands—all drawing on the same underlying world model. The cognitive model is where the "adhesion" component of meaning lives: the grounding in physical reality that pure language systems lack.

Some background: Gary Marcus, Generative AI’s crippling and widespread failure to induce robust models of the world, Marcus on AI, June 28, 2025, https://garymarcus.substack.com/p/generative-ais-crippling-and-widespread

Activity 3: Associative Drift and Discovery

The challenge: Current AI systems are reactive, not curious. They solve problems you give them but don't discover problems worth solving. They lack what I'm calling associative drift—the capacity for open-ended, low-bandwidth exploration that enables serendipitous discovery.

Think about how intellectual discovery actually works. When I searched "Xanadu" on the web years ago, I had no hypothesis—just idle curiosity. When I got 2 million hits, I had a hunch that seemed interesting (though I couldn't articulate why). The opportunity cost of investigating was low, so I poked around. Eventually I discovered distinct cultural lineages (sybaritic via Citizen Kane, cybernetic via Ted Nelson's hypertext project) that revealed something about how cultural memes evolve.

This is fundamentally different from task-directed reasoning. I wasn't trying to solve a predefined problem. I was in a low-bandwidth exploratory mode, sensitive to interesting patterns, following hunches without clear goals. Current LLMs operate only in high-bandwidth mode: given a prompt, they generate detailed responses. They can't "skim" or "wonder" or "notice something odd" without generating full text.

We need architectures that support dual-mode processing: high-bandwidth for focused problem-solving, low-bandwidth for pattern detection during exploration. This requires technical innovations (sparse attention patterns, adaptive computation, salience detection) and new ways of thinking about AI objectives. How do we train systems to explore productively without specific goals?

For robotics, this is essential. A robot with associative drift doesn't just execute commands—it develops intuitions about its environment through undirected exploration, notices regularities, forms hunches about what matters. It becomes genuinely curious rather than merely reactive.

The interesting twist: associative drift needs the other programs. Ontologies provide the structured space that makes certain patterns "interesting" (ontologically distant concepts appearing together). Cognitive models enable embodied drift (noticing patterns through physical interaction). And drift enables discovery in the other programs (finding ontological incoherences, noticing when modalities misalign).

Gringos Tacos

Rodney Brooks on the state of AI and Robotics

As you may know, Rodney Brooks has been keeping an annual scorecard for various categories of high-tech activity. He puts it online on the first of the year. I’ve listed some excerpts from the 2026 scorecard below. The scorecard has much much more that I haven’t excerpted.

The Falcon 9

Eight years ago, Falcon 9 had been launched 46 times, all successful, over the previous eight years, and it had recently had a long run of successful landings of the booster whenever attempted. At that time five launches had been on a previously used booster, but there had been no attempts to launch Falcon Heavy with its three boosters strapped together.

Now we are eight years on from those first eight years of Falcon 9 launches. The scale and success rate of the launches has made each individual launch an unremarkable event, with humans being launched a handful of times per year. Now the Falcon 9 score card stands at 582 launches with only one failed booster, and there have been 11 launches of the three booster Falcon Heavy, all successful. That is a sustained growth rate of 38% year over year for eight years. And that it is a very high sustained deployment growth rate for any complex technology.

There is no other modern rocket with such a volume of launches that comes even close to the Falcon 9 record. And I certainly did not foresee this volume of launches. About half the launches have had SpaceX itself as the customer, starting in February 2018, launching an enormous satellite constellation (about two thirds of all satellites ever orbited) to support Starlink bringing internet to everywhere on the surface of Earth.

[Not AI or robotics, I know. But it interests me.] 

Humanoid Robots

My blog post from September, details why the current learning based approaches to getting dexterous manipulation will not get there anytime soon. I argue that the players are (a) collecting the wrong data and (b) trying to learn the wrong thing. I also give an argument (c) for why learning might not be the right approach. My argument for (c) may not hold up, but I am confident that I am right on both (a) and (b), at least for the next ten years.

I also outline in that blog post why the current (and indeed pretty much the only, for the last forty years) method of building bipeds and controlling them will remain unsafe for humans to be nearby. I pointed out that the danger is roughly cubicly proportional to the weight of the robot. Many humanoid robot manufacturers are introducing lightweight robots, so I think they have come to the same conclusion. But the side effect is that the robots can not carry much payload, and certainly can’t provide physical support to elderly humans, which is a thing that human carers do constantly — these small robots are just not strong enough. And elder care and in home care is one of the main arguments for having human shaped robots, adapted to the messy living environments of actual humans.

Given that careful analysis from September I do not share the hype that surrounds humanoid robotics today. Some of it is downright delusional across many different levels.

At the end:

Meanwhile here is what I said at the end of my September blog post about humanoid robots and teaching them dexterity. I am not at all negative about a great future for robots, and in the nearish term. It is just that I completely disagree with the hype arguing that building robots with humanoid form magically will make robots useful and deployable. These particular paragraphs followed where I had described there, as I do again in this blog post, how the meaning of self driving cars has drifted over time.

Following that pattern, what it means to be a humanoid robot will change over time.

Before too long (and we already start to see this) humanoid robots will get wheels for feet, at first two, and later maybe more, with nothing that any longer really resembles human legs in gross form. But they will still be called humanoid robots.

Then there will be versions which variously have one, two, and three arms. Some of those arms will have five fingered hands, but a lot will have two fingered parallel jaw grippers. Some may have suction cups. But they will still be called humanoid robots.

Then there will be versions which have a lot of sensors that are not passive cameras, and so they will have eyes that see with active light, or in non-human frequency ranges, and they may have eyes in their hands, and even eyes looking down from near their crotch to see the ground so that they can locomote better over uneven surfaces. But they will still be called humanoid robots.

There will be many, many robots with different forms for different specialized jobs that humans can do. But they will all still be called humanoid robots.

As with self driving cars, most of the early players in humanoid robots, will quietly shut up shop and disappear. Those that remain will pivot and redefine what they are doing, without renaming it, to something more achievable and with, finally, plausible business cases. The world will slowly shift, but never fast enough to need a change of name from humanoid robots. But make no mistake, the successful humanoid robots of tomorrow will be very different from those being hyped today.

Neural networks

Despite their successes with language, LLMs come with some serious problems of a purely implementation nature.

First, the amount of examples that need to be shown to a network to learn to be facile in language takes up enormous amounts of computation, so the that costs of training new versions of such networks is now measured in the billions of dollars, consuming an amount of electrical power that requires major new investments in electrical generation, and the building of massive data centers full of millions of the most expensive CPU/GPU chips available.

Second, the number of adjustable weights shown in the figure are counted in the hundreds of billions meaning they occupy over a terabyte of storage. RAM that is that big is incredibly expensive, so the models can not be used on phones or even lower cost embedded chips in edge devices, such as point of sale terminals or robots.

These two drawbacks mean there is an incredible financial incentive to invent replacements for each of (1) our humble single neuron models that are close to seventy years old, (2) the way they are organized into networks, and (3) the learning methods that are used.

That is why I predict that there will be lots of explorations of new methods to replace our current neural computing mechanisms. They have already started and next year I will summarize some of them. The economic argument for them is compelling. How long they will take to move from initial laboratory explorations to viable scalable solutions is much longer than everyone assumes. My prediction is there will be lots of interesting demonstrations but that ten years is too small a time period for a clear winner to emerge. And it will take much much longer for the current approaches to be displaced. But plenty of researchers will be hungry to do so.

LLMs

So we all know we need guard rails around LLMs to make them useful, and that is where there will be lot of action over the next ten years. They can not be simply released into the wild as they come straight from training.

This is where the real action is now. More training doesn’t make things better necessarily. Boxing things in does.

Already we see companies trying to add explainability to what LLMs say. Google’s Gemini now gives real citations with links, so that human users can oversee what they are being fed. Likewise, many companies are trying to box in what their LLMs can say and do. Those that can control their LLMs will be able to deliver useable product.

A great example of this is the rapid evolution of coding assistants over the last year or so. These are specialized LLMs that do not give the same sort of grief to coders that I experienced when I first tried to use generic ChatGPT to help me. Peter Norvig, former chief scientist of Google, has recently produced a great report on his explorations of the new offerings. Real progress has been made in this high impact, but narrow use field.

New companies will become specialists in providing this sort of boxing in and control of LLMs.

A note on embodiment

But since 1991 I have made a distinction between two concepts where a machine, or creature can be either, neither, or both situated and embodied. Here are the exact definitions that I wrote for these back then:

[Situatedness] The robots are situated in the world—they do not deal with abstract descriptions, but with the here and now of the world directly in-fluencing the behavior of the system.

[Embodiment] The robots have bodies and experience the world directly—their actions are part of a dynamic with the world and have immediate feed-back on their own sensations.

At first glance they might seem very similar. And they are, but they are also importantly different. And, spoiler alert, I think much of the work at companies, large and small, right now, is trying abstract out the embodiment of a robot, turning it into a machine that is merely situated.

Later:

Being both situated and embodied is still a challenge to robots in the world. [[Now here is the most important sentence of this whole blog post.]] I think the training regimes that [are] being used for both locomotion and dexterity are either ignoring or trying to zero out the embodiment of physical robots, their inertias and forces, reducing them to merely being situated, just apps with legs and arms, characters in video games, not the reality of real physical beings that the tasks we want them to do requires.

Puttering around as a behavioral mode

Over the weekend I got to thinking about the activity of puttering around. I wondered whether or not there has been any formal study about. So I put the following prompt to ChatGPT 5.2:

I’ve talked a lot about DMF (default mode network) and mind wandering in connection with creativity. It seems to me that puttering around is in the same ballpark. I’m not thinking and pondering, but I’m doing stuff, rather casually and in no forced order. Just as things come. It seems I like to putter early in the morning, on Friday afternoon, and over the weekend. Thus I just spent two days puttering around.

Since I’ve not had a “day job” in years, the distinction between weekdays and weekends is not forced on me. Yet it remains with me. It structured by time for most of the year in primary and secondary school. A bit less so in undergraduate and graduate school, but still operative. It also functioned when I had a faculty position at RPI (1978-1985) and at those times later on when I had one of those day jobs. So, it’s become deeply internalized though no longer externally imposed.

Its response took the form of numbered headings, subheadings, and bulleted lists. I find that sometimes useful, sometimes annoying. Since I wanted to write it up in a blog post I asked ChatGPT to express that material in the form of expository prose. I’ve put its response below the asterisks.

* * * * *

I’ve been thinking about puttering—not thinking hard, not resting, but doing things casually and without a forced order. It feels important, especially in connection with creativity, but I don’t know of any literature that treats it directly. What follows is an attempt to make sense of that intuition.

Cognitive science has devoted enormous attention to the default mode network (DMN)—a set of brain regions that become active when we are not engaged in externally directed tasks. The DMN is associated with mind wandering, autobiographical memory, imaginative simulation, and creative recombination. It is usually contrasted with task-positive networks that dominate during focused, goal-directed work.

But this framing leaves out a familiar and, I suspect, crucial mode of everyday cognition: puttering around.

Puttering is not rest, and it is not deliberate thinking. It is also not task-oriented in any strong sense. When I am puttering, I am doing things—moving objects, skimming texts, opening files, following small curiosities—but without urgency, hierarchy, or a clear endpoint. Attention shifts as affordances present themselves. One thing leads to another, but nothing insists.

This matters because puttering is not simply mind wandering in the absence of action. It is mind wandering coupled to low-stakes activity. That coupling provides just enough structure to keep cognition anchored in the world, while leaving associative processes free to roam. In that sense, puttering is an enacted version of default-mode cognition.

Most DMN research is organized around a sharp contrast: rest versus task. Subjects lie still in scanners or perform narrowly defined activities. What rarely gets studied is the large middle ground—self-paced, low-demand activity where executive control is attenuated but not absent. Yet it is precisely in this middle ground that many creative insights seem to arise.

One way to think about puttering is as a form of foraging. Cognitive scientists have developed mathematical models of information foraging and semantic search, showing how minds move through memory spaces under conditions of weak constraint. But these models are typically abstract and disembodied. Puttering, by contrast, is foraging with hands, eyes, and objects. It is exploratory behavior distributed across mind and environment.

Embodied and enactive theories of cognition come closer to capturing this phenomenon. They emphasize that cognition is not confined to the head but unfolds through interaction with the world. Still, even here, the focus is often on skilled action or purposeful engagement. Puttering remains oddly invisible: too mundane, too informal, too unproductive by conventional standards.

Creativity research acknowledges something similar under the heading of incubation. We are told that stepping away from a problem can allow unconscious processes to reorganize it. But incubation is usually framed as a gap between periods of work, not as a mode of activity in its own right. Puttering is active incubation—ongoing, embodied, and temporally permissive.

Time matters here. Even though I no longer have a “day job,” I notice that puttering reliably emerges at certain times: early in the morning, Friday afternoons, weekends. These temporal zones were once imposed externally—by school schedules, academic calendars, workweeks—but they have become internalized. They now function as cognitive affordances. Certain kinds of thinking feel licensed in those periods, while others feel out of place.

That licensing effect is important. Puttering lowers internal monitoring and performance pressure. It delays premature optimization. It allows partially formed ideas to coexist without being forced into coherence too soon. In more technical terms, it increases associative bandwidth while keeping executive control at a minimum viable level. For someone with a large and densely connected memory base, that is a powerful configuration.

Modern work culture tends to suppress puttering. It valorizes focus, efficiency, and measurable output, leaving little room for loosely structured activity that may not immediately “pay off.” Ironically, many contemporary AI systems explicitly reintroduce formal analogues of puttering: stochastic sampling, random walks, background exploration, temperature-controlled search. We design machines to do what we often deny ourselves permission to do.

There is, as far as I know, no direct scientific literature on puttering around. Instead, it falls through the cracks between DMN research, embodied cognition, creativity studies, and ecological psychology. Yet that absence is revealing. Puttering occupies a cognitive regime that our current conceptual frameworks do not handle well: neither rest nor work, neither task nor distraction, but something quietly generative in between.

If that is right, then puttering is not a failure of discipline or a lapse in productivity. It is a mode of cognition in its own right—one that deserves recognition, protection, and perhaps eventually, study. 

* * * * *

Note, however, that I did not explicitly ask it to write as my voice. As you can see, though, that's what it did. Oh well.

Sunday, January 11, 2026

Saturday, January 10, 2026

What a Wonderful World (Louis Armstrong) | Playing For Change

Playing For Change is proud to present this video of the song "What A Wonderful World" featuring Grandpa Elliott with children's choirs across the globe. In these hard times, children and music bring us hope for a better future. Today we celebrate life and change the world one heart and one song at a time!!

Classical improvisation by a Danish American Jewish comedian, Victor Borge

I'm bumping this one to the top of the queue because 1) I want to, 2) it's my blog, and 3) Borge is wonderful.
Victor Borge never got the memo saying that classical musicians stopped improvising sometime during the middle of the 19th century. So, when a fiddle player wanted to perform a tune that Borge had heard, but never played, Borge simply improvised an accompaniment. It's a bit over the top at points, but then Borge is a comedian. Watch how the two men interact act with one another. There are points where one or the other doesn't quite know what's coming up, so they have to look and listen.
 
 
Pay attention at about 3:13, in a slow section (stuck in the middle of all the fast stuff). At about 3:43 Borge starts a nice counter melody; from 3:54 to the end it's nuts, with a nice counter melody in octaves at about 4:02. Notice the nice hesitation for the very last note, a skillful touch.

Direct brain-to-brain communication, redux

Why Learning Does Not Rescue Brain-to-Brain Thought Transfer 

Learning Is Not the Problem

There is no serious dispute, at this point, about the brain’s capacity to learn to incorporate new signal streams. Decades of work on motor prostheses, sensory substitution, neurofeedback, and tool use have demonstrated that the nervous system can adapt to novel inputs and outputs that are not part of its evolved repertoire. These systems work not because the brain passively receives meaning, but because it actively learns to coordinate new patterns of neural activity with action, perception, and feedback. Over time, what begins as an alien signal can become functionally integrated into the organism’s sensorimotor economy.

Acknowledging this plasticity does not weaken skepticism about direct brain-to-brain thought transfer. On the contrary, it sharpens the distinction between what is genuinely possible and what remains a fantasy. Learning is one thing. Zero-shot “plug-and-play” communication is something else entirely. The speculative proposals advanced by Elon Musk, Christof Koch, and Rodolfo Llinás depend not merely on plasticity, but on the assumption that meaningful mental content can be transferred between brains without a learning history, without negotiation, and without interpretive work. That assumption is precisely what fails.

The Zero-Shot Assumption

The defining feature of most brain-to-brain communication fantasies is immediacy. Thoughts are imagined to pass directly from one person to another, bypassing language, culture, and development. Koch’s examples of ghostly visual overlays and mind fusion, as well as Musk’s talk of “uncompressed conceptual communication,” all presuppose that the recipient brain can immediately make sense of neural activity originating elsewhere. The temporal dimension of learning—the weeks, months, or years required to integrate new signal regimes—is simply ignored.

This is not a minor omission. It is the conceptual hinge on which the entire proposal turns. Without a learning trajectory, there is no mechanism by which foreign neural activity could acquire meaning for the receiving brain. A signal does not become meaningful by virtue of its richness or bandwidth. It becomes meaningful only through use, within a system that can test, revise, and stabilize interpretations through action.

Why Learning Cannot Proceed in a Brain-Bridge

One might reply that learning could occur even in a brain-to-brain link, given enough time. But this response overlooks the conditions under which learning is possible in the first place. Learning requires a closed perception–action loop. The organism must be able to act on the basis of a signal, observe the consequences of that action, and adjust its internal dynamics accordingly. In brain–machine interfaces, this loop is explicit: the user moves a cursor, grasps an object, or modulates a tone, and receives immediate feedback. The signal becomes meaningful because it is embedded in a task space with clear success and failure conditions.

A direct brain-to-brain link provides no such structure. The receiving brain cannot act into the other brain in any systematic way, nor can it test hypotheses about what a given pattern of activity “means.” The signal stream has no stable reference point in the shared environment, no agreed-upon goal, and no external criterion of correctness. Under such conditions, learning has nothing to converge on. What is sometimes described as “another person’s thought” arrives as undifferentiated neural activity, untethered from the bodily and environmental contexts that made it meaningful in the first place.

The Persistent Problem of Origin

Even if one were to imagine some form of slow co-adaptation, a deeper problem remains: the brain must be able to distinguish between activity it generates itself and activity it should treat as input. In ordinary perception and action, this distinction is grounded in efference copy, proprioception, and the tight coupling between movement and sensation. These mechanisms allow the brain to tag certain patterns as self-generated and others as world-generated.

A foreign brain provides none of these anchors. Neural spikes arriving from another person’s cortex are indistinguishable, in their physical characteristics, from spikes arising endogenously. Without a principled way to mark activity as coming from an other, the receiving brain has no basis for interpretation, let alone learning. The problem is not noise in the engineering sense, but indeterminacy in the biological sense. The system lacks the resources to sort the signal at all.

Meaning Is Not a Payload

Underlying the zero-shot fantasy is a deeper theoretical mistake: the treatment of meaning as something that exists prior to expression and can therefore be transmitted once bandwidth constraints are removed. This is the same mistake that underwrites the conduit metaphor of language. Words are imagined as containers for thoughts, and communication as the transfer of those containers from one mind to another. Neuralink-style speculation simply replaces words with spikes, while leaving the basic picture intact.

But meaning does not work that way. Whether one follows Vygotsky, contemporary enactivism, or predictive-processing accounts, the conclusion is the same: meaning is enacted, not transmitted. It arises through socially scaffolded activity, through interaction with the world and with others, and through the internalization of those interactions in inner speech. There is no pre-linguistic, pre-social format of “pure thought” waiting to be uploaded or shared.

Augmentation Without Communion

None of this casts doubt on the medical and augmentative goals of current BCI research. Restoring motor function, providing artificial sensory channels, and extending human capabilities through learned interfaces are all plausible and worthwhile. But these technologies work precisely because they respect the conditions under which brains learn: limited task spaces, stable feedback, and prolonged adaptation. They augment agency; they do not merge subjectivities.

Direct brain-to-brain thought transfer, by contrast, promises communion without development, understanding without negotiation, and immediacy without practice. It imagines semantic interoperability where none can exist. For that reason, it fails not because the technology is immature, but because the underlying conception of thought, meaning, and learning is mistaken.

The issue, in the end, is not whether brains can change. They can, and they do. The issue is whether meaning can be detached from the histories that make it possible. On that point, the answer remains no.

Friday, January 9, 2026

Variations on a Wild Image

I decided to run a bunch of variations on the image that I used for the cover of my working paper, Serendipity in the Wild: Three Cases, With remarks on what computers can’t do (link to the blog post where I introduce it).  I'm listing them in the order that I had ChatGPT create them. That will help explain why, for example, the shark in Mughal version has has a goofy look on its face and why that version has a mangled watch in the sand. 

For the most part the prompt was simple, the name of a style  or an artist. The manga style is my favorite. Some of the images are less successful than others.

The style of a Renaissance etching 

The style of a comic book devoted to science fiction stories. 

Manga style 

Japanese Ukiyo-e print.

I realize that the subject matter is, shall we say, anachronistic for ukiyo-e, but that goofy guy at the lower right is a freakin' bridge too far.

Chuck Jones style (Warner Brothers). 

Serendipity and the Structure of Discovery: What Accidents Reveal About Human Creativity

 

An essay written by Claude in response to a long prompt I took from a ChatGPT discussion of serendipitous discovery.

 

Introduction

When Alexander Fleming returned from vacation in September 1928 to find his bacterial cultures contaminated with mold, he faced a choice that any laboratory researcher would recognize: discard the ruined plates and start fresh, or pause to ask why a clear zone had formed around the contamination where bacteria refused to grow. Fleming paused. That moment of curiosity about a laboratory accident eventually led to penicillin and transformed modern medicine. But what made Fleming pause? What cognitive structure allowed him to recognize meaning in what others would have seen as mere mess?

The phenomenon we call serendipity—the accidental discovery of something valuable while looking for something else—offers a unique window into the nature of human creativity. These moments reveal not just how discoveries happen, but what kind of minds can make them happen. As we develop increasingly sophisticated artificial intelligence systems, the question becomes more pressing: can machines be serendipitous? Or does serendipity require something that current computational approaches cannot replicate?

The Anatomy of Accident

The history of science and technology is studded with serendipitous discoveries, each following a recognizable pattern. In 1895, Wilhelm Röntgen was experimenting with cathode rays when he noticed a fluorescent screen glowing across the room, well away from his apparatus and supposedly shielded from it. Rather than dismissing this peripheral observation as irrelevant to his experimental target, he investigated. X-rays emerged not from his intended line of inquiry but from his willingness to follow an anomaly.

Percy Spencer's discovery of the microwave oven in 1945 followed a similar trajectory. Working on radar equipment at Raytheon, Spencer noticed a chocolate bar melting in his pocket. This bodily experience—the unexpected warmth, the mess—could easily have been dismissed as an irritating side effect of standing near the magnetron. Instead, Spencer recognized it as a signal worth pursuing. He experimented with popcorn kernels, then eggs, systematically exploring what this accident might reveal about microwave radiation's interaction with food.

Charles Goodyear's 1839 discovery of vulcanized rubber came from dropping a rubber-sulfur mixture onto a hot stove. The accident occurred under extreme conditions he had not planned to test. The mixture didn't become sticky and useless as expected; it charred slightly but became elastic and durable. Goodyear recognized immediately that this accidental phase change revealed something fundamental about rubber's material properties.

In the chemical industry, Constantin Fahlberg's 1879 discovery of saccharin followed yet another pattern. After a day working with coal tar derivatives, Fahlberg noticed his hands tasted sweet. Rather than simply washing them and moving on, he traced the taste back to his laboratory bench, systematically testing compounds until he isolated saccharin. A bodily sensation—taste—became the signal that something interesting had occurred.

These canonical examples share a structure: an accident occurs, someone notices it, and rather than treating it as noise or contamination, they investigate. But this description conceals the deeper mystery. Accidents happen constantly in laboratories and workshops. Most are indeed noise. Most contaminated cultures should be discarded. Most peripheral observations are irrelevant. What distinguishes the accidents that matter?

The Prepared Mind and Structure

Louis Pasteur famously observed that "chance favors only the prepared mind." But what constitutes preparation? It cannot simply mean knowing what you're looking for, since serendipity precisely involves finding something you weren't seeking. The preparation must be of a different kind—not a prepared answer but a prepared capacity to ask new questions.

Consider the more recent case of Gila monster venom leading to GLP-1 drugs like Ozempic and Wegovy. A researcher collected the venom decades ago, not with any specific therapeutic application in mind but out of what might be called biological curiosity—a sense that unusual biochemical systems might someday prove valuable. The venom sat in freezers for years, creating what we might call "latent option value." Only much later, when researchers were investigating glucose metabolism, did a peptide from that venom reveal its ability to regulate blood sugar. The initial collector had no hypothesis about diabetes drugs. They were simply gathering interesting biological materials on the principle that interesting systems often prove useful.

This pattern of open-ended harvesting without specific goals appears throughout the history of discovery. It suggests that preparation involves not just deep knowledge of a field but a particular cognitive stance toward the world—one that treats anomalies as potentially meaningful rather than merely aberrant, that maintains curiosity even when immediate applications are unclear, that builds resources of knowledge "just in case" rather than only "just in time."

We might think of each researcher as carrying a unique "snapshot" of the world's structure built up through years of experience, false starts, and accumulated hunches about how their domain works. When an accident occurs, it encounters not a blank slate but this richly structured mental model. Fleming noticed the bacterial clearing because decades of bacteriological work had taught him to pay attention to bacterial behavior. Spencer recognized the melted chocolate as meaningful because his work with radar had given him intuitions about electromagnetic radiation's effects. Fahlberg traced the sweet taste back to his bench because his training had taught him to attend to unexpected chemical properties.

The structure in a researcher's mind is necessarily recursive—it includes models not just of phenomena but of how phenomena reveal themselves, how observations connect to explanations, how accidents might signal deeper patterns. When Röntgen saw the unexpected glow, his response emerged from an understanding not just of cathode rays but of how scientific instruments can reveal hidden aspects of nature. The accident struck a prepared mind that was structured to wonder about such revelations.

Opportunity Cost and the Economics of Curiosity

Yet preparation alone cannot explain serendipity. Researchers often ignore anomalies not because they fail to notice them but because investigating would be too costly. Here the economics of exploration becomes crucial.

The 3M chemist Spencer Silver was trying to create a strong adhesive in 1968 but instead produced a weak, reusable one—initially a failure. The adhesive could have been immediately discarded as not meeting specifications. But 3M's culture of tinkering and the low cost of keeping failed experiments around meant the weak adhesive persisted in the laboratory. Years later, Art Fry was singing in his church choir and growing frustrated with bookmarks that kept falling out of his hymnal. The two problems met: weak adhesive plus temporary bookmark need equals Post-It Notes.

The discovery succeeded not because anyone had a brilliant initial hypothesis but because the cost of keeping a "failed" experiment was low enough that it could persist until finding its proper application. Most hunches and accidents lead nowhere. But if you can keep them around cheaply, occasionally one pays off spectacularly.

The Kellogg brothers' discovery of corn flakes followed a similar pattern. They accidentally left cooked wheat sitting out overnight. The frugal thing was to use it anyway rather than waste it. When they rolled the stale wheat and discovered it flaked, economic constraint—don't waste the wheat—had created the conditions for discovery. If the opportunity cost of wasting wheat had been low, there would be no corn flakes.

Pfizer's development of Viagra illustrates another dimension of this economic logic. The compound sildenafil was being tested as a treatment for angina and hypertension. In clinical trials, it showed modest effects on the intended conditions but remarkable effects on male erectile function. The company could have abandoned the compound as a failure in its intended application. Instead, recognizing that the "side effect" might be more valuable than the original purpose, they pursued an entirely different market. The opportunity cost of investigating this unexpected effect was low—they'd already done much of the safety testing—and the potential payoff was enormous.

In the wine industry, champagne itself emerged from a similar reframing. For centuries, wine that re-fermented in bottles was considered faulty—the bottles might explode, the wine turned fizzy instead of still. But gradually, vintners in the Champagne region recognized that this "flaw" could be valuable. Rather than fighting re-fermentation, they developed techniques to control it. An expensive problem (exploding bottles) became a premium product once someone reframed the accident as an opportunity.

Roy Plunkett's 1938 discovery of Teflon reveals yet another aspect of low opportunity costs. He found that a cylinder of refrigerant gas had mysteriously solidified. The standard procedure would be to discard the cylinder as contaminated or defective. But Plunkett's curiosity was cheap to satisfy—it took only a few minutes to saw open the cylinder and examine the white powder inside. That brief investigation revealed polytetrafluoroethylene, eventually leading to non-stick coatings.

The pattern across these cases is consistent: serendipity thrives when the cost of investigating anomalies is low relative to potential payoffs. When researchers or institutions can afford to keep "failed" experiments, to pursue unexpected effects, to investigate mysteries for their own sake, they create conditions for serendipitous discovery. When every resource must be justified against immediate objectives, serendipity becomes much rarer.

Thursday, January 8, 2026

Serendipity in the Wild: Three Cases, With remarks on what computers can’t do

New working paper. Title above. URLs, abstract, table of contents, and abstract below.

Academia.edu:
https://www.academia.edu/145860186/Serendipity_in_the_Wild_Three_Cases_With_remarks_on_what_computers_cant_do
SSRN: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6043814
ResearchGate: https://www.researchgate.net/publication/399584810_Serendipity_in_the_Wild_Three_Cases_With_remarks_on_what_computers_can't_do

Abstract: This paper examines intellectual creativity as it occurs in ordinary, open-ended scholarly practice, through three detailed case histories drawn from my own work in the humanities. Each case traces how a line of inquiry emerged without a predefined problem, method, or endpoint, and how a vague sense of interest gradually crystallized into a focused intellectual project.

Introduction: Serendipity in Mind

The first case reconstructs the process by which an unplanned viewing of Jaws developed into a Girardian interpretation of the film, centered on the recognition of Quint as a sacrificial victim. The second follows an exploratory investigation of the term “Xanadu” on the early web, where a surprising search result led, through low-cost probing, to the identification of distinct cultural clusters. The third describes the discovery of a previously unrecognized center-point structure in Conrad’s Heart of Darkness, originating in the noticing of a minor narrative anomaly and pursued through opportunistic quantitative checks.

Across these cases, creative work proceeds through hunches, comparative wandering, sensitivity to salience, and decisions shaped by opportunity cost, rather than through the execution of well-defined tasks. Only late in each process does a clear problem boundary emerge, enabling more systematic reasoning.

The paper uses these cases to clarify a limitation of contemporary large language models and related AI systems. While such systems can operate effectively once a problem frame is supplied, they do not participate in the open-ended, exploratory processes by which humans discover what is worth investigating in the first place. The cases thus support a model of human–AI complementarity in which problem-finding remains a distinctively human contribution, and AI serves as a powerful tool once direction has been established.   

Table of Contents

Introduction: Serendipity in Mind 3
Intellectual creativity: Case 1, Interpreting Jaws 4

Watching Jaws 5
The game is afoot 6
I’m in! 6
What kind of a process is that? 7

Intellectual creativity: Case 2, The Xanadu meme 9

Cultural evolution 10
The cybernetic cluster appears 11
But still, what got me started? 12
Opportunity cost 13

Intellectual creativity: Case 3, Center-point construction in Heart of Darkness 15

Story and plot, Center-point construction 15
What would an AI do? 19
Is this real? How do we know? 21

Appendix 1: Summary and Evaluation by Claude 23

Summary 23
Evaluation 23
Conclusion 24

Appendix 2: Summary and Evaluation by ChatGPT 26

Summary 26
Overall Assessment 27
Human–AI Complementarity: Lessons from Three Cases 28

Appendix 3: Seven Brief Examples of Serendipity 30    

Introduction: Serendipity in Mind

In the three years since ChatGPT was released on the web the sophistication of chatbots has increased a great deal, leading some to assert that it won’t be long before chatbots will be able to outperform humans on all intellectual tasks. I’m not so sure. Why? Because the problems they solve, the tasks they’ve been set, all seem to be bounded tasks in well-explored universes. As Johnson, Karimi, and Bengio have remarked in a recent article, many important intellectual tasks require “the ability to navigate intractable problems - those that are ambiguous, radically uncertain, novel, chaotic, or computationally explosive.” They talk of wisdom as the quality missing in current machines. I’m not sure what I think about that word, “wisdom”; it carries a lot of baggage.

I’m more comfortable talking about serendipity, not as an attribute of minds, or of situations, but as, well, whatever it is that serendipity characterizes. In the third appendix I present seven brief cases that I elicited from ChatGPT with a prompt, but the burden of my argument rests accounts of three problems that I’ve worked on: an interpretation to Steven Spielberg’s Jaws, the distribution of the term “Xanadu” on the web (the “Xanadu meme”), and the discovery of center-point structure in Joseph Conrad’s Heart of Darkness. I present these cases, not because I think that they’re somehow special. My thinking in these cases doesn’t seem to me to be particularly abstract. It’s not “rocket science” as the saying goes. I present those cases simply because they are mine and I know them in detail. I know what I did, when I did it, and why. What I did depended on experiences and “hunches” built up through experience. Those don’t strike me as the kinds of things amenable to current computational techniques.

Once I’d completed drafting those cases I presented them to both ChatGPT and Claude. I’ve included summary excerpts from those interactions as appendices. I’ll let Claude conclude this introduction:

Benzon's case studies effectively demonstrate that the most interesting intellectual work often begins not with problem-solving but with problem-finding. His examples show how experience, intuition, and willingness to follow hunches create opportunities for genuine discovery. While current AI systems are powerful analytical tools, they lack the kind of embodied experience and open-ended curiosity that drives human creativity. The work suggests that the future of AI-assisted scholarship may lie not in replacing human insight but in creating powerful partnerships where humans excel at boundary-setting exploration and AI excels at systematic analysis within those boundaries.

Under an umbrella

The compute theory of everything

Samuel Albanie, Reflections on 2025, December 30, 2025. The first section, of three, is entitled "The Compute Theory of Everything." Here's an excerpt:

I have come to believe that every engineer must walk the road to Damascus in their own time. One does not simply adopt the Compute Theory of Everything by hearing others discuss it. You have to be viscerally shocked by the pyrotechnics of scale in a domain you know too well to be easily impressed.

For many senior engineers, that shock arrived in 2025. I have watched colleagues who were publicly sceptical through 2023 and 2024 quietly start to integrate these systems into their daily work. The “this is just a stochastic parrot” grimace has been replaced by the “this stochastic parrot just fixed my RE2 regex”. They still say “this can’t do what I do”, but the snorts of laughter have been replaced with a thoughtful silence and the subtle refreshing of their LinkedIn profile.

My own conversion came earlier. It is a privilege of my career that I was working in one of the first fields to get unceremoniously steamrollered by scaling: Computer Vision. During a glorious period at the VGG in Oxford, I spent months crafting bespoke, artisanal architectural inductive biases. They were beautiful, clever, and they had good names. And then, in early 2021, my approach was obliterated by a simple system that worked better because it radically scaled up pretraining compute1. I spent a full afternoon walking around University Parks in shock. But by the time I reached the exit, the shock had been replaced by the annoying zeal of a convert.

Returning to my desk, it did not take long to discover that the Compute Theory of Everything is 50 years old and has been waiting patiently in a Stanford filing cabinet since the Ford administration.

In 1976, Hans Moravec wrote an essay called “The Role of Raw Power in Intelligence“, a document that possesses both the punch and the subtlety of a hand grenade. It is the sort of paper that enters the room, clears its throat, and informs the entire field of Artificial Intelligence that their fly is down. Moravec’s central thesis is that intelligence is not a mystical property of symbol manipulation, but a story about processing power, and he would like to explain this to you, at length, using log scales and a tone of suppressed screaming.

He starts with biology, noting that intelligence has evolved somewhat independently in at least four distinct lineages: in cephalopods, in birds, in cetaceans, and in primates. He spends several pages on the brainy octopus covering the independent evolution of copper-based blood and the neural architecture of the arms, citing a documentary in which an octopus figures out how to unscrew a bottle to retrieve a tasty lobster from inside. One gets the impression he prefers the octopus to many of his colleagues. The evolutionary point is that intelligence is not a fragile accident of primate biology. It is a recurring architectural pattern the universe stumbles upon whenever it leaves a pile of neurons unattended. The octopus and the crow did not copy each other’s homework. Instead, they converged on the answer because the answer works. The question is: what is the underlying resource?

Moravec’s answer is: it’s the compute, stupid.

To make his point, he compares the speed of the human optic nerve (approximately ten billion edge-detection operations per second) to the PDP-10 computers then available at Stanford. The gap is a factor of more than a million. He calls this deficit “a major distorting influence in current work, and a reason for disappointing progress.” He accuses the field of wishful thinking, scientific snobbery, and (my favourite) sweeping the compute deficit under the rug “for fear of reduced funding.” It is the sound of a man who has checked the numbers, realized the Emperor has no clothes, and is particularly annoyed that the Emperor has neither a GPU nor a meaningful stake in God’s Chosen Company: Nvidia (GCCN).

This leads to his aviation aphorism that has become modestly famous, at least among the demographic that reads 1976 robotics working papers for recreational purposes: “With enough power, anything will fly.” Before the Wright brothers, serious engineers built ornithopters (machines that flapped their wings, looked elegant, and stayed resolutely on the ground). Most failed. Some fatally. The consensus was that AI was a matter of knowledge representation and symbolic reasoning, and that people who talked about “raw power” were missing the point and possibly also the sort of people who enjoy watching videos of monster truck rallies (a group that includes your humble author). Moravec’s point was that the Symbolic AI crowd were busy building ornithopters, obsessing over lift-to-drag ratios, while the solution was to strap a massive engine to a plank and give researchers the chance to brute-force the laws of physics into submission.

Twenty-two years later, he published an update. “When Will Computer Hardware Match the Human Brain?“ which opens with a sentence that has aged like a 1998 Pomerol:

“The performance of AI machines tends to improve at the same pace that AI researchers get access to faster hardware.”

He plots curves, whips up a Fermi estimate that human-level cognition requires on the order of 100 million MIPS, and predicts this capability will be available in affordable machines by the 2020s. The paper includes a chart in which various organisms and machines are arrayed by estimated computational throughput. The spider outperforms the nematode by a humiliating margin. Deep Blue appears as a reference point for what IBM’s R&D budget bought you in 1997, which was the ability to defeat Garry Kasparov at chess while remaining unable to recognise a photograph of a chess piece. The figure is instructive, but after staring at it for a few minutes, it can start to grate on one’s sensibilities. Perhaps because it treats the human soul as an arithmetic problem. Philosophy on two axes.

There's more on the compute theory of everything, which is worth your while.

Let me add that, for myself, sure, we need enough compute. That's necessary, but not sufficient. LLMs are a limited architecture. Throwing more compute at them isn't going solve all the problems. I've got a working paper that's relevant: What Miriam Yevick Saw: The Nature of Intelligence and the Prospects for A.I., A Dialog with Claude 3.5 Sonnet. Here's Claude's summary:

Wednesday, January 7, 2026

An underground American military base in Greenland in the 1950s (now abandoned)

00:00 Intro
02:17 No man’s land
03:44 Why America wants Greenland
06:55 The 1950s
09:41 A city with a secret
11:07 How to build Camp Century
16:11 Living under the ice
19:10 The end of Project Iceworm
20:42 The future