Monday, January 12, 2026

AI for the Next 30 Years: Four kinds of activity that should be pursued

Here’s the prompt I used to elicit a text from Claude:

I want to do a blog post setting forth those four programs. I want it to be between, say, 1000 and 2000 words, no more. It should have an introduction, sections for each of the four programs, and some final remarks. Give it a title like: AI for the next 30 years, an open-ended plan. Assume a college-educated readership that is generally sophisticated. I’m going to put it in my New Savanna blog along with another post in which I present excerpts from Rodney Brooks’ current remarks on technology.

I’ve spent a lot of time over the last three days conceptualizing those four programs, interacting with both ChatGPT 5.2 and Claude 4.5. Those chats, in turn, rest on work that I’ve done with both chatbots over the last three years. Moreover, I have uploaded a fair number of documents to those chatbots, both articles from the formal literature and informal working papers, going back five decades.

Note that AGI is not mentioned anywhere nor did I ask ChatGPT to make dated prediction. Predicting where the earth will be in the solar system in 30 years, that’s easy. We’ve known how to do that since Newton. Predicting the weather 30 years out, very difficult to impossible. But maybe we can come up with rough estimates of average temperature for the year, and precipitation. Predicting the 30-year evolution of a complex socio-cultural-technical system? Not on your life.

I’ve edited Claude’s text in some minor ways and added so links at the end of each section.

AI for the Next 30 Years

Large Language Models represent something fundamentally new in computing: systems that have learned vast amounts about the world but encode that knowledge implicitly, in billions of inscrutable parameters. We can use these systems—often impressively—but we don't truly understand what they know or how they organize that knowledge. It's as if we've discovered a vast wilderness: we can explore parts of it, but we lack comprehensive maps.

Over the past few years, I've been thinking about what it would take to map this territory systematically and transform it from mysterious wilderness into reliable infrastructure. This thinking has crystallized into four parallel research programs, each essential, each reinforcing the others. Unlike the prevailing Silicon Valley vision of one lab developing a superintelligent system that does everything, this is a distributed, collaborative, multi-decade effort requiring both technical innovation and institutional creativity.

Activity 1: Ontology Extraction

The challenge: LLMs generates texts that distinguish between dogs and cats, animate and inanimate, concrete and abstract—but this knowledge exists only implicitly in weight matrices. We need to extract this latent ontological structure and make it explicit and inspectable.

Recent work by Christopher Manning and colleagues at Stanford has shown that neural networks encode rich linguistic structure—syntax trees, for instance—that can be extracted through systematic probing. I'm proposing we extend these methods from linguistic structure to ontological structure: the categories, hierarchies, and affordances that organize conceptual knowledge.

The key insight is that ontology is implicit in syntax. Verbs select for certain kinds of subjects and objects based on categorical presuppositions. "Eat" requires an animate agent and edible patient. These selectional restrictions reveal the categorical structure underneath. By systematically probing syntactic behavior, clustering words by shared patterns, and validating through transformation tests, we can extract the ontologies LLMs have learned.

This work must be distributed across many research groups, each focusing on specific domains—medical ontologies, legal ontologies, physical systems ontologies, and so forth. No single lab has the expertise or resources to map the entire territory. We need shared infrastructure (probing tools, ontology repositories, validation benchmarks) and coordinated standards, but the actual extraction work happens in specialized communities with deep domain knowledge.

The payoff: explicit ontological structure that can be verified, debugged, systematically improved, and integrated with symbolic reasoning systems. We transform opaque neural networks into hybrid systems that combine learning with legible structure.

Some background:

Christopher Manning et al. Emergent linguistic structure in artificial neural networks trained by self-supervision, PNAS 2020, https://www.pnas.org/doi/full/10.1073/pnas.1907367117

William Benzon, ChatGPT: Exploring the Digital Wilderness, Findings and Prospects, https://www.academia.edu/127386640/ChatGPT_Exploring_the_Digital_Wilderness_Findings_and_Prospects (see especially pp. 28-38, 42-44]

Activity 2: Cognitive Models and Multimodal Grounding

The challenge: Extracting ontologies from language gives us how language talks about the world, not how minds represent the world for perception and action. A robot needs more than linguistic categories—it needs grounded representations that integrate vision, touch, motor control, and yes, language, into a unified cognitive model. This distinction is standard in the cognitive sciences, including “classical” symbolic AI. I picked it up in the work I did with David Hays in the 1970s on cognitive networks for natural language semantics. We conceived of language mechanisms as operating on a separate cognitive model—language is an interface to the model, not the container of it. For embodied AI and robotics, this becomes crucial.

Consider a cup. The linguistic ontology tells us: cup is-a container, is-a artifact, can-hold liquids. The cognitive model adds: cylindrical shape with hollow interior, graspable via handle, stable on flat surfaces, rigid, will break if dropped, liquid spills if tilted beyond 45 degrees. This is sensorimotor knowledge grounded in perception and action, not purely linguistic.

Current multimodal systems (like GPT-4V or Gemini) take vision and "linguistify" it—everything gets processed through language. What we need are systems where multiple modalities read and write to a common cognitive model. Vision contributes spatial structure, language contributes categorical relationships, action contributes causal understanding, and they all integrate.

This research connects directly to robotics. A robot exploring a new kitchen needs to build spatial maps, identify affordances, understand causal relationships (that knob controls that burner), and eventually respond to linguistic commands—all drawing on the same underlying world model. The cognitive model is where the "adhesion" component of meaning lives: the grounding in physical reality that pure language systems lack.

Some background: Gary Marcus, Generative AI’s crippling and widespread failure to induce robust models of the world, Marcus on AI, June 28, 2025, https://garymarcus.substack.com/p/generative-ais-crippling-and-widespread

Activity 3: Associative Drift and Discovery

The challenge: Current AI systems are reactive, not curious. They solve problems you give them but don't discover problems worth solving. They lack what I'm calling associative drift—the capacity for open-ended, low-bandwidth exploration that enables serendipitous discovery.

Think about how intellectual discovery actually works. When I searched "Xanadu" on the web years ago, I had no hypothesis—just idle curiosity. When I got 2 million hits, I had a hunch that seemed interesting (though I couldn't articulate why). The opportunity cost of investigating was low, so I poked around. Eventually I discovered distinct cultural lineages (sybaritic via Citizen Kane, cybernetic via Ted Nelson's hypertext project) that revealed something about how cultural memes evolve.

This is fundamentally different from task-directed reasoning. I wasn't trying to solve a predefined problem. I was in a low-bandwidth exploratory mode, sensitive to interesting patterns, following hunches without clear goals. Current LLMs operate only in high-bandwidth mode: given a prompt, they generate detailed responses. They can't "skim" or "wonder" or "notice something odd" without generating full text.

We need architectures that support dual-mode processing: high-bandwidth for focused problem-solving, low-bandwidth for pattern detection during exploration. This requires technical innovations (sparse attention patterns, adaptive computation, salience detection) and new ways of thinking about AI objectives. How do we train systems to explore productively without specific goals?

For robotics, this is essential. A robot with associative drift doesn't just execute commands—it develops intuitions about its environment through undirected exploration, notices regularities, forms hunches about what matters. It becomes genuinely curious rather than merely reactive.

The interesting twist: associative drift needs the other programs. Ontologies provide the structured space that makes certain patterns "interesting" (ontologically distant concepts appearing together). Cognitive models enable embodied drift (noticing patterns through physical interaction). And drift enables discovery in the other programs (finding ontological incoherences, noticing when modalities misalign).

Some background:

Samuel G. B. Johnson, Amir-Hossein Karimi, Yoshua Bengio, et al., Imagining and building wise machines: The centrality of AI metacognition, arXiv:2411.02478v1 [cs.AI], https:/doi.org/10.48550/arXiv.2411.02478

William Benzon, Serendipity in the Wild: Three Cases, With remarks on what computers can't do, January 8, 2026, https://www.academia.edu/145860186/Serendipity_in_the_Wild_Three_Cases_With_remarks_on_what_computers_cant_do

William Benzon, From Mirror Recognition to Low-Bandwidth Memory, August 8, 2025, https://www.academia.edu/143347141/From_Mirror_Recognition_to_Low_Bandwidth_Memory_A_Working_Paper (see especially pp. 9-14)

Activity 4: Institutional Infrastructure

The challenge: None of the above happens without solving the political economy problem. AI ontologies are public goods—non-rivalrous, high fixed costs, massive positive externalities. This creates market failure: private actors under-invest because they can't capture sufficient value, yet social benefits are enormous.

This is not primarily a technical problem; it's an institutional design problem. How do we fund comprehensive ontology extraction when any single company's investment benefits all its competitors? How do we coordinate distributed research across academic labs, industry groups, and international boundaries? How do we ensure extracted ontologies remain publicly accessible while enabling private innovation on top?

History offers precedents: the Internet's open protocols with competitive services; the Human Genome Project's public data enabling private biotech; weather data collected publicly but packaged commercially; electricity regulated as public utility. Each solved the public goods problem differently, and we need to learn from all of them.

The likely solution is hybrid and layered: foundational ontologies funded publicly and kept open; domain ontologies developed by industry consortia with open standards; specialized applications remaining competitive. This requires new governance structures—neither pure market nor pure state control—with careful attention to preventing monopolization while enabling innovation.

The institutional work is as important as the technical work and will take just as long. We're talking about decades of negotiation among stakeholders with competing interests: researchers want openness, companies want advantage, governments want oversight, the public wants benefit. Getting this right matters as much as getting the algorithms right.

Some background: Henry Farrell, Alison Gopnik, Cosma Shalizi, James Evans, Large AI models are cultural and social technologies, Science, 25 February, 2025, https://www.science.org/doi/10.1126/science.adt9819

Final Remarks

These four programs form an integrated research agenda for the next 30 years. They're not sequential steps but parallel efforts that reinforce each other:

  • Ontologies provide explicit structure that makes cognitive models interpretable and guides associative drift toward interesting patterns.
  • Cognitive models ground ontologies in perception and action, enabling embodied drift and genuine understanding.
  • Associative drift discovers problems worth solving in the other programs and enables creative exploration.
  • Institutional infrastructure makes all three technically feasible at the scale required.

This vision differs fundamentally from the prevailing narrative of racing toward superintelligence. Instead of one breakthrough from one lab, it's distributed collaboration across communities. Instead of black-box capabilities, it's making structure explicit and inspectable. Instead of replacing human intelligence, it's building tools for human intellectual work—making knowledge more accessible, reasoning more transparent, discovery more systematic.

The wilderness must be mapped, not just traversed. And mapping requires not just explorers but surveyors, cartographers, institutions to maintain the maps, and agreements about standards and access. It's the work of a generation, but it's work worth doing.

The research programs are clear. The methods are increasingly viable. What remains is the coordinated will to undertake systematic exploration rather than just another gold rush into unmapped territory. Thirty years from now, we could have AI systems we actually understand, built on infrastructure we collectively maintain, serving purposes we can articulate and verify.

Or we could have another generation of inscrutable systems, concentrated in a few hands, whose capabilities we can use but never fully trust.

The choice is ours, but only if we choose deliberately.

No comments:

Post a Comment