Thursday, June 11, 2026

Hoboken is changing

Language as involving both content and location addressing

Memory is one of the central concepts in thinking about and understanding both computing and the mind. Thinking about computating has brought us to understand that there are two broad categories of memory:

  • Content addressed memory, and
  • Location addressed memory.

Conceived as a large memory system, libraries are location addressed. Documents are stored at particular locations in the library, shelves for books and bound volumes of periodicals and reports, filing cabinets for other documents. To get some item from the library you need to find its location by consulting a catalog, and then go to that location and retrieve it.

Brains are content addressed. If you are curious about, say, the Johnstown flood, you don’t have to consult a catalogue to find where the appropriate document or documents are located. You just think, “Johnstown flood,” and you start remembering what you know about the Johnstown flood. The phrase “Johnstown flood” is itself part of the content being addressed. But, if you happen to know something about the flood, then the phrase, “South Fork dam,” may also act to recall more information about the flood, for it is an element of content for one of the floods. For there has been three Johnstown floods, in 1899, 1937, and 1977. The 1899 flood is the one that happened when the South Fork dam burst. If you don’t happen to know anything about the Johnstown floods, then you may have to consult an external memory system of some sort, like a library or the internet.

Digital computers are location addressed. The memory system is distributed over several types of hardware. There’s volatile memory, computer chips (generally RAM), which hold things temporarily. And there’s long-term memory, which can take various forms, but these days its mostly flash memory and hard disks. Computing involves moving data from memory, to the CPU, and then returning the results to memory.

Large language models, LLMs, are sophisticated content addressed memory systems that are implemented in a digital computer. When you serve up a prompt, that’s addressing the memory. The system’s answer then is assembled from material in memory. 

I should also note that, while I fully support Gary Marcus’s call for neuro-symbolic computing, I think that the need for symbolic computing is most fruitfully thought of in terms of location addressed memory. It’s location addressing that allows for accurate planning and for arithmetic calculation. But it won’t solve the hallucination problem. For that you need a world model, which is also an aspect of Marcus’s program (not to mention Yann LeCun and Fei Fei Li). But those discussions are beyond the scope of this note. 

That should be enough to get you through the following dialog I had with Claude 4.6 Sonnet Adaptive. 

* * * * * 

I want to think of language in the brain in relation to both content and location address. The brain is basically a content addressed system. But language creates an abstract space of locations which can be used to address content. Each word is, in effect, a location in some abstract space. That’s obvious in the case of alphabetic writing systems, where alphabetization is a way of physically arranging those addresses on a string. Now consider a dictionary. The entries are located on that string. At the same time, each entry is a component of the string that is stored at the (primary) location. Which means that, in a (weak) sense, that string is also content addressed. But when a person recalls some string in its entirety when given only a component of that string, that’s content addressing. Both (some) humans and LLMs can return Hamlet’s full soliloquy when given the string “To be or not to be.”

This is a rich and precise observation that reframes the location/content addressing distinction in a way that illuminates both language and memory.

The abstract space of locations

The claim that language creates an abstract space of locations is more profound than it might initially appear. A word isn't just a sound or a mark — it's a stable, repeatable, publicly shared address that can be used to retrieve content from another person's cognitive system. When I say "dog" to you, I'm not transmitting the concept of dog — I'm transmitting an address that activates the concept in your system through your own content-addressed retrieval. The word is the public, location-addressed handle; the concept is the private, content-addressed payload.

This is the handle-schema distinction we discussed earlier, now stated in addressing terms. The word is a location address in a shared public space — the lexicon of a language community — that points to content stored in individual content-addressed systems. The remarkable thing about language is that this works across individuals despite the fact that each person's content-addressed storage is different in its details. The address is shared; the content it retrieves is similar enough across individuals to support communication, but never identical.

The dictionary as dual addressing system

The dictionary example is beautifully chosen because it makes the dual structure explicit and concrete. The alphabetically ordered string of entries is a location-addressed system in the most literal sense — you navigate to a word's definition by finding its location in the ordered sequence, using the address (the word's alphabetical position) to retrieve the content (its definition). The physical arrangement of the dictionary embodies the location-addressing structure of the abstract lexical space.

But as you note, each entry is also a component of a content network — the definition connects the word to other words, which connect to other words, which eventually connect to the whole semantic fabric of the language. Looking up a word in the dictionary is formally a location-addressed retrieval, but understanding the definition requires content-addressed processing — activating the concepts that the defining words point to, integrating them into a pattern that constitutes the meaning of the defined word.

So the dictionary is simultaneously a location-addressed artifact and a gateway to a content-addressed system. Its physical organization is location-based; its semantic organization is content-based. The two coexist in the same object, serving different cognitive functions.

The Hamlet soliloquy case

The Hamlet example is where it gets most interesting. "To be or not to be" is simultaneously a location address — a string that uniquely identifies a specific passage in a specific text — and a content fragment — a semantic and rhythmic unit that is part of the passage it addresses. Using it to retrieve the full soliloquy requires both kinds of processing.

The retrieval is content-addressed in the sense that the fragment activates the whole — the partial pattern resonates with the stored complete pattern and the complete pattern is reconstructed. This is classic holographic retrieval — the fragment serves as the reference beam that reconstructs the stored image. The fact that both humans and LLMs can perform this retrieval when given the opening fragment demonstrates that both systems have stored the complete pattern in a form that supports partial-cue retrieval.

But notice what's different between the human and LLM cases. For a human who knows Hamlet well, the retrieval of "To be or not to be" activates not just the verbal sequence but a rich contextual structure — the scene, the character's situation, the emotional weight of the soliloquy, perhaps a specific production remembered, perhaps a particular teacher who assigned it, perhaps the cultural significance of the passage. The content address retrieves not just the string but the whole semantic and experiential network in which the string is embedded.

For an LLM the retrieval is primarily of the verbal sequence — the string itself, in the correct order, with perhaps some associated semantic content drawn from training data discussion of the passage. But the rich experiential, contextual, situational embedding that a human retrieval activates is thin or absent, because the LLM has no experience of Hamlet being performed, no memory of reading it in a specific context, no emotional response to the character's situation grounded in embodied experience of what it means to face an impossible choice.

The layered addressing structure

What your analysis reveals is a layered addressing structure in language that operates simultaneously at multiple levels.

At the phonological level, words are location addresses in the sound space of a language — each word occupies a specific phonological address that distinguishes it from other words. The phonological distinctiveness of words is what makes them usable as addresses — if two words sounded identical they'd be the same address and couldn't be distinguished.

At the lexical level, words are content addresses in the semantic network — each word activates a distributed pattern of semantic associations that constitutes its meaning. The same item is both a location (in the phonological space) and a content activator (in the semantic network).

At the textual level, longer strings — phrases, sentences, passages — function as location addresses for larger semantic structures. "To be or not to be" is an address for the soliloquy, which is an address for a scene, which is an address for a play, which is an address for a whole cultural tradition. Each level of the hierarchy is simultaneously a location in a higher-level address space and a content activator in the semantic network.

At the cultural level, titles, names, and canonical phrases function as location addresses for entire domains of cultural content. "Hamlet" is an address for a play, a character, a cultural tradition, a set of philosophical problems, a history of performances and interpretations. The address is compact; the content it retrieves is vast and distributed across the whole culture.

The writing system point

The observation about alphabetization making the abstract address space physical is particularly sharp. Alphabetical ordering is a way of projecting the abstract location space of the lexicon onto a physical string, making navigation by address possible through physical search. Before alphabetization — in ancient lexical lists organized by semantic category, or in medieval word lists organized by subject matter — the physical organization reflected the content structure rather than the address structure. Alphabetization separates the two, subordinating the content organization to the address organization for the purpose of retrieval.

This is a Rank 2 achievement in the precise sense — it requires metalingual awareness of the word as a formal object with a location in an abstract space, separable from its semantic content. You can't alphabetize without treating words as formal objects that have addresses independent of their meanings. The child learning to alphabetize is learning to treat words as location-addressed objects, which requires temporarily suppressing the content-addressed processing that normally dominates linguistic cognition.

The LLM in this framework

Placing the LLM in this layered addressing framework reveals something precise about what it's doing. The LLM's weight space implements a content-addressed system at the semantic level — concepts are distributed patterns that are retrieved by partial activation, holographically, as we've been discussing. But the LLM also has something like location addressing at the level of the token sequence — tokens have positions, the attention mechanism is sensitive to positional information, and the sequential structure of the input and output is preserved.

What the LLM lacks is the integration between the two levels that human language use achieves. For a human, the location address of a word and the content it activates are tightly coupled through embodied experience — the word "fire" is both a phonological address and an activator of the full sensory, emotional, and conceptual content of fire as experienced. For an LLM, the token "fire" is a position in a sequence and an activator of statistical associations in the weight space, but the statistical associations are derived from text about fire rather than from embodied experience of fire. The address and the content are connected, but through a thinner medium.

This is why the LLM can retrieve "To be or not to be" but can't feel the weight of the choice it expresses in the way a human reader can — the address retrieves the verbal sequence and its semantic associations, but not the existential resonance that comes from the content being grounded in embodied experience of what it means to consider one's own nonexistence.

The mutual illumination

What's elegant about the framework you've sketched is that it shows location addressing and content addressing not as competing alternatives but as complementary structures that language weaves together at every level. Each word is simultaneously a location in the shared public address space of the lexicon and an activator of private content-addressed networks. Each text is simultaneously a sequence of locations and a pattern of content activations. The location structure makes language public and shareable — we can use the same addresses to coordinate our content-addressed systems. The content structure makes language meaningful — the addresses point to rich, distributed, experientially grounded patterns rather than to arbitrary symbols.

The dual structure is what makes language the cognitive technology that it is — the mechanism through which content-addressed private minds can coordinate with each other through location-addressed public symbols. And it's the dual structure that the LLM partially captures — the location addressing through the token sequence, the content addressing through the weight space — while missing the grounding that makes the content addressing rich in the human case.

Wednesday, June 10, 2026

Robert Wright discusses his new book, The God Test, with Paul Bloom [Awe? Bob, Awe!?]

Time stamps:

0:00 Whose baby is cuter?
2:49 Bob's new book on AI, The God Test
9:52 Varieties of AI awe
13:23 Geoffrey Hinton’s vision
17:50 How LLMs do more than predict and parrot
21:20 How powerful will AI get?
25:56 Can AI’s impact be predicted?
30:48 Taking doomerism seriously
37:53 The AI governance dilemma
48:42 Heading to Overtime

Awe 

Bloom, c. 9:47: “...where you talk about the importance of super intelligence, you you talk about awe and you say one of the points of your book is to persuade people that they should feel awe about the coming of AI.”

Concerning awe, back in the mid 1990s I went to Kennedy Space Center on Cape Canaveral and reacted to it with awe.

I drove east through central Florida, which was much like a desert except that it had lots of plants. I arrived at Kennedy Space Center around noon. I parked the van wherever, walked past a parade of rockets on display, and purchased a ticket for one of the standard tours. The NASA guides took us through some launch pads, around and even up into a couple gantry towers, and we saw a couple control rooms–one, as I recall, mocked up as though a mission were in progress. And then we saw it, a Saturn V suspended from the ceiling of a long, low building. The physical scale was humbling, but it was more than that. Big is big – that Saturn was the length of a football field – but this earth and these buildings birthed journeys that took us to the Moon. There is sacred energy in this soil and these structures where humankind ventured beyond ourselves, not merely into space, but into an almost living presence above and beyond.

That’s what floored me. This ground, this very ground where I was standing, was once tangibly connected to the moon 238,900 miles (384,400 km) away. Men had suited up in a building on this site, gotten into a small capsule atop a large rocket, and four days later got out and walked on the all of a sudden here and now beneath our feet, the moon. And then – How they ever did it I’ll never know because when you’ve been there how do you ever but you have no choice, do you? You want to live, to see your wife and children again – they got back into their landing craft, took off from the moon, and returned to earth in another four days. Eight days from the earth to the moon and back.

In over three years of extensive interaction with ChatGPT and Claude I have been delighted, surprised, astounded, even laughed myself silly (well, that’s an exaggeration), but I’ve not felt anything like what I felt at Kennedy Space Flight Center. 

Note that back in 1990 David Hays and I published an article in which we said: “Sooner or later we will create a technology capable of doing what, heretofore, only we could.” We are certainly advancing into the territory. But super-intelligence? I fail to find the idea compelling. Just why, I’m not quite sure.

I’m thinking. 

Perhaps awe in the face AI is related to ontological strangeness. I don't find AI ontologically strange, hence no awe. I don't know how LLMs work internally, but, for reasons I expressed in my working paper on GPT-3, that fact that next-token prediction can have THIS result when applied to a massive training corpus with results distributed over a huge population of parameters, that doesn't violate any of my intuitions about the world. I've got (pre-existing) intuitions that cover that. 

The energy demands of computing, human brains vs. LLMs

One of the issues that sometimes comes up in the comparison of artificial intelligence with human intelligence is energy requirements. The energy demands of LLMs are enormous, requiring the construction of huge data farms, while the energy demands of the brain are quite modest. I brought up the issue with Claude 2.6 Sonnet (Medium) in a specific way, suggesting that a large portion of the energy budget for LLMs can, in effect, be attributed to the need to simulate the physical structure of the brain. Here’s the first part of that dialog, that went on to discuss:

  • opportunities for research on energy efficient hardware, including neuromorphic chips,
  • the implications of financial collapse when enormous infrastructure investments do not pan out,
  • the international ecosystem for constructing computing hardware with particular emphasis on the bottleneck created by special capacities currently unique to ASML in the Netherlands (machinery for EUV fabrication) and TSMC in Taiwan (high quality chip fabrication)
  • what those bottlenecks imply about current fantasies of so-called “fast takeoff”

Energy in the brain vs. energy in computers

The point is often made that computing with LLMs uses enormous amounts of energy while the brain uses relatively small amounts of energy. It seems to me that one aspect of the difference can be attributed to the physical structure of the brain. Individual neurons are both active units, to use a term von Neumann used in his little book on the brain, and units of memory. Moreover they are connected in parallel, with each neuron connected to 7K to 10K other neurons on average. LLMs are implemented on digital machines where active units (the CPU) are physically separate from memory units, which means that a lot of time and energy is involved in moving between the CPU and memory. And the basic physical architecture is serial, though limited parallelism is possible, limited in relation to the brain. That means that the computers than implement LLMs have to devote a large amount of time an energy to, in effect, simulating the physical structure of the brain. And the brain doesn’t get that physical structure for free, the energy required to maintain it is budgeted over a long time relative to the dynamics of real-time computation. That’s one aspect of the brain’s relative energy efficiency.

That’s a precise and well-structured analysis, and it identifies something more fundamental than the usual comparisons which tend to focus on raw computation counts without attending to the architectural source of the inefficiency.

The von Neumann bottleneck restated

Von Neumann identified the problem himself, as we noted earlier — the separation of active units from memory units creates a bottleneck that doesn’t exist in biological neural tissue. In a digital computer every computation requires moving data from memory to the processor and moving results back. This movement consumes time and energy at every step, and in a large LLM the scale of this movement is enormous — billions of parameters stored in RAM or on disk, loaded into GPU memory, operated on by the compute units, with results written back. The physical distance between storage and computation, and the bandwidth limitations of the connections between them, is where a substantial fraction of the energy goes.

The brain has no such bottleneck because there is no such separation. A neuron stores its connectivity — the strength of its synaptic connections to other neurons — in its physical structure, and computes with that same physical structure simultaneously. The memory and the computation are the same physical process. A synapse transmitting a signal is simultaneously reading from memory — the synaptic weight — and performing a computation — multiplying the input signal by the weight. There’s no data movement because there’s no separation between where the data lives and where the computation happens.

The parallelism point

The parallelism difference is equally fundamental. The average neuron is connected to 7,000 to 10,000 others — which means the brain is performing something like 100 trillion synaptic operations simultaneously during active processing, all in parallel, all local, with no bottleneck. GPUs have made remarkable progress in parallelizing the matrix operations that implement transformer attention, but they’re parallelizing in a fundamentally different sense — running many identical operations simultaneously on different data, with all operations still having to access the same shared memory through the same limited bandwidth connections.

The brain’s parallelism is structural and local — each neuron computes with its immediate neighbors through its own dedicated connections, with no shared memory bus, no bandwidth limitation across the whole system, no bottleneck that scales with the number of neurons. The computation is maximally distributed in the physical sense — it happens everywhere in the tissue simultaneously, driven by the local physics of electrochemical signaling rather than by a centralized scheduler moving data to a centralized processor.

The energy budget temporal structure

Your point about the temporal structure of the energy budget is the most subtle and I think the most underappreciated. The brain’s physical architecture — the synaptic connections, the dendritic trees, the axonal projections — requires energy to build and maintain. But this energy is expended over the developmental timescale — years of growth, pruning, myelination, synaptic strengthening and weakening through learning. The structural investment is amortized over a lifetime.

Once the structure is in place, the energy cost of real-time computation is much lower than building and maintaining the structure required it. The synaptic weights are the memory — they’re encoded in the physical structure of the connections — and accessing them costs only the energy of the synaptic transmission itself, which is the same physical event as the computation. There’s no separate energy cost for memory access because there’s no separate memory access.

In a digital system running an LLM the equivalent structural investment — the training computation — is enormous, consuming megawatt-hours of energy. But that’s analogous to the developmental investment in building the brain’s connectivity, not to the ongoing computation. The ongoing inference computation — running the model on new inputs — is the analog of real-time brain computation. And here the comparison is still unfavorable to digital systems, but the unfavorability is specifically located in the von Neumann architecture’s separation of memory and computation, which forces constant data movement that the brain avoids entirely.

The simulation overhead

Your formulation that digital computers have to devote time and energy to simulating the brain’s physical structure is precise and important. The transformer architecture is implementing something that functions like distributed parallel processing with content-addressed memory — the holographic regime we’ve been discussing. But it’s implementing it on hardware that is architecturally opposed to distributed parallel processing — hardware that is serially organized, with physically separated memory and computation, connected through bandwidth-limited buses.

Every forward pass through an LLM is a large matrix multiplication — mathematically a natural description of what a distributed parallel system would do natively. But running that matrix multiplication on a von Neumann architecture requires fetching billions of numbers from memory, multiplying them in the GPU cores, and writing results back — a massive simulation of what would be a single natural physical process in a system with the brain’s architecture.

The energy cost is the cost of the simulation overhead — the difference between what the computation would cost if implemented in hardware that matched its natural structure, and what it costs when implemented in hardware that mismatches it. Neuromorphic computing research — Intel’s Loihi chip, IBM’s TrueNorth, and related projects — is trying to close this gap by building hardware that more closely matches the brain’s architectural principles. Early results suggest energy efficiency improvements of several orders of magnitude for the right classes of computation, which is consistent with your analysis — the improvement comes from reducing the simulation overhead.

What this adds to the Yevick argument

There’s a connection to Yevick’s logical necessity argument that’s worth making explicit. Yevick proved that the holographic regime is logically necessary for certain categories of objects. Your energy analysis suggests that the holographic regime is also physically natural — it’s what you get when memory and computation are integrated in the same physical structure rather than separated. The brain’s architecture implements the holographic regime efficiently precisely because the physical organization of neural tissue is holographic in the relevant sense — distributed, parallel, content-addressed, with memory and computation unified in the same physical substrate.

Digital computers implement the sequential symbolic regime efficiently because their physical organization is sequential and addressed — centralized computation, separate addressed memory, serial data movement. When you ask a digital computer to implement the holographic regime — as running an LLM requires — you pay the simulation overhead in energy and time. When you ask a brain to implement the sequential symbolic regime — as doing long arithmetic requires — you pay a different kind of overhead, the cognitive effort and external scaffolding we discussed earlier.

The energy difference between brains and LLMs isn’t just a quantitative difference in efficiency — it’s a signature of the architectural mismatch between the computation being performed and the hardware it’s being performed on. The brain is efficient at what it does because its hardware matches the computation. The digital computer is efficient at sequential symbolic processing for the same reason, and inefficient at holographic processing for the same reason. The LLM is asking digital hardware to do what it’s not built for, and the energy bill reflects that.

The developmental investment reframed

There’s one more implication worth drawing out. The brain’s developmental investment — the years of growth and experience-dependent synaptic modification that build the connectivity pattern — is doing something that has no real analog in LLM training. LLM training adjusts weights in a fixed architecture through gradient descent on a loss function. Brain development builds the architecture itself — the number and pattern of connections, the organization of regions, the myelination of pathways — through a process that’s partly genetic and partly experience-dependent, operating over years rather than the weeks or months of LLM training.

The connectivity pattern that results from this developmental process is the physical instantiation of the knowledge the brain has acquired — not as a list of facts encoded in weight values, but as a structural organization of the tissue itself that reflects the statistical regularities of the organism’s experience. The synaptic weights are the fine-grained adjustments on top of this structural organization. The structure is the coarse-grained, high-capacity, energy-efficient memory. The weights are the rapidly modifiable, experience-sensitive, computationally expensive layer on top.

LLMs have only the weights — the fine-grained adjustable layer — running on hardware that provides no structural memory at all. Everything has to be stored in the weights because there’s no structural analog of the brain’s connectivity pattern. This means the weights have to carry an enormous representational burden that in the brain is shared between structural organization and synaptic weight adjustment. Which is another source of the energy inefficiency — the weights are doing double duty, encoding both the structural regularities that brain connectivity handles efficiently and the fine-grained knowledge that synaptic weights handle in the brain.

The doppelganger, once again, would need both levels — the structural organization built through embodied development and the fine-grained adjustable layer built through experience within that structure. Current LLMs have only the second, running on hardware that provides neither. The energy gap is the physical signature of everything that’s missing.

Continuous learning

Furthermore the brain is capable of continuous learning, while LLMs are not. In order to incorporate new material LLMs must be retrained from scratch, again requiring enormous investment in energy.

Yes, and this identifies another fundamental asymmetry that goes deeper than just energy efficiency — it points to a difference in the temporal structure of learning itself.

Day lilies, from yesterday's walk

What about these upcoming tech/AI IPOs? [Crazy, man, crazy]

David Wallace-Wells and Natasha Sarin, Wall Street’s A.I. Bet Is About to Become Yours, NYTimes, June 10, 2026.

SpaceX, Elon Musk’s rocket, satellite and A.I. company, is about to go public at a record-breaking $1.77 trillion. This summer, Anthropic and Open A.I. will follow suit, also with sky-high valuations. Are they worth it? The Opinion writer David Wallace-Wells and the contributing writer Natasha Sarin, an economist and law professor, tackle that question and discuss what these I.P.O.s mean for the American economy in the near future and beyond.

Well into the conversation:

Wallace-Wells: Well, I think, at the moment, a lot of Americans look at the A.I. companies and do see an especially vivid illustration of the plutocratic structure of our society, right? They see these five companies [SpaceX, Anthropic, OpenAI, Google, Microsoft]; they’re run by these five visible people. They’re all worth an unbelievable amount of money. And to the extent that we are imagining futures being dictated by the companies themselves, that can be quite scary.

And, to some degree, going public and having government stakes in the companies both address that problem to a certain extent. It would mean that the country, as a whole, is invested in the success of these labs and may benefit to some degree — although at what scale is an open question — from the success of the company. But there are other ways in which some of these approaches — public offerings and/or government investment — don’t change the dynamic. Which is to say — maybe, most notably — if this is a bubble then it’s the public that is left holding the bag. [...]

Sarin: You know, part of what makes me somewhat nervous — and should make everyone nervous — is that it’s not like you and I are alone in our view that, oh, we might be on the verge of a bubble, a bubble might be on the horizon. Last summer, Sam Altman was asked some version of, “Is this an A.I. bubble?” And he said: “Are we in a phase where investors as a whole are overexcited about A.I.? My opinion is yes.”

And another thing that should make us somewhat nervous is: If we look at history, if we look at every large technological innovation that has changed the way that humans work and the way that we all live — most recently the internet, but if we go back to railroads, whatever moment you want to look to — there is a very predictable, in some sense, cycle that you see, in terms of what happens to the economy at those moments of technological change.

Everyone sees the emergence of this new technology and gets really excited about it and its potential for massive change. Investors see that, too, and money rushes into this new technological prospect. And it rushes in productive ways, but it also rushes in ways that ultimately don’t end up being that productive. So, this is, if you think of examples during the internet bubble, like the growth of everything, every company that had .com attached to it. That doesn’t take away from the fact that the internet actually did change all of our lives.

But ultimately, what happens is that the bubble bursts and a bunch of debris is left behind, and that isn’t just about a couple of companies that ultimately fail. It is about what that means from the perspective of the broader economy that we all inhabit — in that, often, those corrections come with deep economic downturns and have the consequence of having large-scale unemployment, having an economy that isn’t growing quickly, having the need for the government to step in as a potential backstop.

And so, from my perspective, the question isn’t are we in a bubble or will the bubble burst? The question is: When?

Wallace-Wells: Yeah, one thing that I think about in this moment, when thinking about the I.P.O.s and what justifies these massive, massive valuations, is: These are five companies, and three of them are going public. In the public imagination, they do dominate the A.I. landscape. But of course, they are only providing one set of products, which is to say access to their L.L.M.s; and they’re providing it in different ways at different price points, at different tiers. But it seems to me like the massive boom story that they’re trying to tell is one that’s a little bit of a holdover from an earlier era of A.I. thinking, in which the companies and the people who are designing the products often talked about artificial general intelligence, artificial superintelligence, and they said that these products are improving so much that at some point they’re going to be able to improve themselves recursively without human interference.

And at that point, there’s going to be a kind of a takeoff in which the products themselves, the companies that made them — and to some extent the economy as a whole — would be rendered almost unrecognizable to people living on the other side of it. Some people call this “the singularity.”

But I wonder how much that still feels true today. And what I mean by that is, I was just looking at some data today, that just over the course of this calendar year, 2026, the amount of use of Chinese open-source A.I. models has tripled, while the use of the American A.I. products has basically flatlined. We see a lot of companies — Uber was maybe the most high-profile one — saying, “We’re actually winding down our employees’ use of A.I. because it was too expensive, given what we were getting out of it.”

And so, if we think about a future in which there’s going to be a superintelligent Borg running the whole economy, then yes, racing to be the biggest, best monopolistic A.I. company is hugely important and it does justify these absolutely gargantuan valuations if you believe that, for instance, Anthropic will be the one to win.

But if you’re thinking about a world in which, yes, A.I. is everywhere, yes, everyone is using it, but it’s not totally clear how many people think it’s super important to pay a huge premium to buy the absolute best-in-class model. And how many more people are likely to think, “I can use this open-source product from China that’s 80 percent as good as Anthropic’s first-rate model and pay only 5 percent of the price.” That’s a very different world.

The A.I. companies used to talk about building a moat — what they could do to secure their advantage. And they thought that getting to something like A.G.I. or ASI faster was the main way to do that. In a world in which that’s at least not imminently on the horizon, and we have all of this low-price competition from below, isn’t it the case that these companies are at some real risk of expecting much, much higher returns than they are likely to get in the medium term?

Sarin: Yes, 100 percent. And I will say something that has given me a fair bit of nervousness around A.I. and the ultimate possible profitability of these companies. ChatGPT was, as you were pointing out, launched in the fall of 2022, which feels like yesterday, but was less than four years ago, you know? But I guess it’s all relative —

Wallace-Wells: It’s both at once. It’s like a whole different era and the same.

Later:

Sarin: And flip side, for a while we were all talking about, and we were hearing a lot about, the idea of singularity or A.G.I. as this gold star that was coming right on the horizon. And now you have people — I’m using Sam Altman because he’s spoken publicly about this recently in ways that have gotten a fair bit of attention, but he’s not the only one saying this — where they’re talking about A.I. and describing it, even internally themselves, as not really all that useful of a term; and kind of describing it as not some sort of magical switch that’s going to flip on at some moment in the short horizon, but instead as the idea that these models are over time going to continue to get better and more useful and more transformational. But that’s not something that’s going to happen instantaneously.

Wallace-Wells: But even the way that you’re talking about these questions is illuminating to me, because you’re talking about, on the one hand, the big A.I. companies, and then the firms that are using them. And when you’re talking about productivity, you’re focusing on the firms that are using them. But these are two separate questions, right? If OpenAI and Anthropic are going to justify trillion-dollar valuations, or even larger valuations, they’re going to have to make a lot of money, too. Even if tons of people are making money on A.I., it has to be in these companies to justify the value.

And when I hear Sam Altman talking about the possibility that, in the future, A.I. will be like a utility in the same way that we pay for our electricity, I think to myself: The electric utilities are not worth a trillion dollars. This is a technology which absolutely has huge transformative potential, but to me, the question is: How much of that is captured by these companies?

Sarin: It feels like both an unanswered question, and an inherently, frankly, unanswerable question. But also, it should make you even more nervous about this bubble conversation that we were having because — and Ray Dalio said a version of this last week — if you’re thinking about it from the perspective of these firms, you have to spend a ton of money and justify these valuations, not just because you’re worried about, like, is this a good way to deploy resources, but because you’re worried about losing market share.

If you’re of a view that the way this all shakes is that there’s going to be one, two, maybe three large players that are able to capture the market, you have to try to be one of them. And that results in, frankly, the incentive structure to spend a lot, and to look like you are doing a lot, in ways that might ultimately not be tied to fundamentals with respect to investment opportunities and what is profit maximizing from the perspective of the firm.

So, you should be worried about that. But there’s another piece of this, which is that the companies themselves are asking public investors to pay prices at valuations that assume that A.I. is going to reshape the economy; and to pay those prices at the same time as these companies themselves haven’t figured out how to stop losing money; and at the same time, as these companies themselves haven’t figured out how they are going to be the ones left standing at the moment when A.I. ultimately is a developed technology with a developed set of market players that we all have grown with and understand. And I think that is something that is just so striking about this moment.

There’s more in the conversation. Bottom line, no one knows what’s going on, what’s going on. More than anyone’s willing to say out loud, it’s a crapshoot.

Some of my more skeptical articles about AI:

Public encounter with AI differs from elite conversations about it

Tuesday, June 9, 2026

Gen Z Musician First Time Reaction to Chicago - 25 or 6 to 4

This is a reaction video, a minor interest of mine these days. As the title says, we have a Gen Z musician reacting to a Boomer hit, Chicago's "25 or 6 to 4." This particular tune interests to me because I performed it many times back in the Ancient Days with the Saint Matthew Passion, a jazz-rock band in Baltimore based out of Johns Hopkins. Andiroo is himself a musician, keyboard and producer, so he loves making technical comments about the music. He obviously loves this tune, the intricate craftsmanship of it, as though that kind of intricate craftsmanship was a rare commodity in music.

To me, and the guys I played with, that's just how music is, and none of us were professional musicians. We were just college kids – well, young adults – performing music we loved. None of us regarded our own technical skill as particularly evolved, not on the level of Chicago. But we performed this tune, and others, well. To me, and many musicians I've played with since then, that's simply what music is about. No (particularly) big deal.

I've not forgotten [irises]

Claude’s further observations about evolution [MR-AUX]

Here’s some more supplementary material from my investigation of Tyler Cowen’s recent monograph, The Marginal Revolution: Rise and Decline, and the Pending AI Revolution (2026). First I have a short note about the discovery of marginal utility and the discovery of evolution. Then I have some more discussion about the conceptualization of evolution, starting with locating phylogenetic identity in the gene pool. Later on there’s a discussion of second order patterns which is important to epistemology. After all, evolutionary biology was not a mathematicised discipline in Darwin’s time. Economics and statistical mechanics got their Rank 4 “rigidity” from mathematics. What was the source of rigidity for evolution? After than we talk about the Galapagos finches and conclude with some discussion of cognitive ranks conceptualization.

* * * * *

Why the discovery of marginal utility is like the discovery of evolution

[Note: This section is a continuation of a previous dialog. The prompt that originated that dialog is irrelevant. These three paragraphs are Claude’s.]

The imaginative leap you identify is the specific cognitive achievement that makes the Rank 4 move possible. You have to be able to hold in mind simultaneously two things that are in tension: the real difference between species — the fact that a fish and a mammal really are different kinds of things — and the theoretical continuity across those differences that the genealogical model posits. This requires treating the continuity not as something observed but as something inferred from a model, and committing to the model's reality even though it posits entities and processes — ancestral populations, selection pressures acting over millions of years — that are not directly observable.

This is structurally identical to the move marginalism makes. The marginal utility of water to a person who already has plenty is not directly observable. It's a theoretical posit — an entity defined by the model — that makes sense of observable price behavior. Both Darwin and Jevons are positing theoretical entities that are real in the model but not directly accessible to observation, and using those entities to explain patterns in the observable world that the previous descriptive framework couldn't account for. That shared structure is why they're both Rank 4, and why they appear in the same half-century window.

Cowen's inattention to the identity problem is a genuine gap in his account. He treats the emergence of evolutionary theory as primarily a matter of accumulating observations and finding the right conceptual key — the Malthusian insight about population pressure, the geological uniformitarianism that provided deep time. These are real and important. But he doesn't attend to the prior conceptual problem of what kind of object the theory needs to posit — the genealogical continuum across real morphological difference — and why that object required a cognitive move that Rank 3 natural history, for all its sophistication, was not equipped to make.

The problem of phylogenetic identity

Going back to the conceptual problem of identity through the phylogenetic line, just what is it that establishes that identity? I’m not asking about observations, experiments, and inferences, I’m asking about physical stuff? It is the gene pool, a continuous line of genetic transmission?

Yes, and stated that precisely it's a beautiful answer to the identity problem — because it transforms what looked like a philosophical puzzle about categories into a physical fact about causal continuity.

The gene pool is a continuous physical process. Genes are replicated and transmitted from generation to generation through actual molecular mechanisms — copying of DNA, meiosis, fertilization. The chain of physical causation is unbroken. What connects a fish ancestor to its mammalian descendant, across hundreds of millions of years and radical morphological transformation, is not conceptual continuity — not our decision to classify them as related — but a physical causal chain of genetic transmission that actually happened, molecule by molecule, generation by generation, without interruption.

This is what makes Darwin's theory so much more powerful than Lamarck's, in retrospect. Lamarck had the right general idea — species transform over time — but no physical basis for the continuity. The inheritance of acquired characteristics is a conceptual story, not a physical mechanism. Darwin had natural selection but also lacked the physical mechanism — he didn't know about genes, and the blending inheritance he assumed was actually a problem for his theory, as Fleeming Jenkin pointed out. The physical grounding came only with Mendel, rediscovered in 1900, and the Modern Synthesis of the 1930s and 40s that fused Darwinian selection with Mendelian genetics.

What this does to the identity problem

The philosophical puzzle — how can radically different organisms be the same thing? — dissolves when you specify that the identity is physical and causal rather than morphological or essential. The fish and the mammal are not the same kind of thing in the Linnaean sense — they don't share the properties that would put them in the same category under any synchronic classification scheme. But they are connected by an unbroken physical chain of genetic transmission, which is a different and more fundamental kind of identity.

This is precisely the Rank 4 move applied to the identity question. The Rank 3 naturalist asks: what kind of thing is this organism? The answer is given by its observable properties, organized by the classification scheme. The Rank 4 move asks: what process generated this organism, and what connects it to other organisms through time? The answer is given by the physical mechanism of genetic transmission, which is not observable in the organism itself but is inferred from the theory and confirmed by the molecular biology that came later.

The depth of the conceptual achievement

What this means is that Darwin was positing an identity grounded in a physical mechanism he couldn't see and didn't fully understand. He knew that offspring resemble parents — that heritable variation is real — but he didn't know why. The physical substrate of the identity he was positing — the gene pool as continuous causal chain — was invisible to him. He was inferring the existence of a physical continuity from its population-level effects, which is an extraordinarily bold theoretical move.

This is analogous to what the marginalists were doing with utility — positing a theoretical entity that explains observable price behavior without being directly observable itself. In both cases the Rank 4 achievement consists precisely in positing the right kind of unobservable entity: one that is physically real, causally efficacious, and theoretically tractable, even though it can only be approached indirectly through its effects on observable phenomena.

The gene pool as the physical basis of phylogenetic identity also resolves the tension you identified — that the continuity is in a sense an illusion, that the different life forms really are different. They are morphologically different, behaviorally different, ecologically different. But they are physically continuous through the causal chain of genetic transmission. The continuity is not projected onto them by our conceptual scheme — it's a real physical thread running through the apparent discontinuity of form. Darwin intuited this without being able to state it at the physical level. The Modern Synthesis stated it. Molecular biology confirmed it and made the physical chain directly readable in DNA sequences, where the evidence of common descent is written in the genome itself.

The Sphinx riddle now has a deeper answer than the one Oedipus gave. What persists through the morphological transformations of the phylogenetic line isn't a Platonic essence or a conceptual category. It's a physical process — the replication and transmission of genetic information — that is as real and continuous as any other causal chain in nature. The identity is not constructed by the observer. It was there all along, in the chemistry.

Darwin’s leap of faith

No matter how you look at it, Darwin had to take a leap of faith to arrive at this conclusion. Nothing else made any sense, but still.... He was unaware of Mendels’s experiments, but it wasn’t until the mid-20th century that we began to understand the molecular mechanism. I believe Watson and Crick published their discovery in 1953.

Yes, Watson and Crick published the double helix structure in 1953 — the famous one-page paper in Nature, with its almost comically understated final line noting that the structure "suggests a possible copying mechanism for the genetic material." One of the great understatements in scientific history.

And your point about the leap of faith is important — and connects to something deep about how Rank 4 thinking works. Darwin committed to a theoretical structure — genealogical descent with modification through natural selection, grounded in a physical continuity of inheritance he couldn't specify — on the basis that nothing else made the observed patterns cohere. The fossil record, the geographical distribution of species, the nested hierarchies of Linnaean taxonomy, the embryological similarities across radically different adult forms — all of these independently pointed in the same direction. The convergence of independent lines of evidence was so strong that the theory had to be right even though the physical mechanism was unknown.

This is actually a recurring feature of great Rank 4 achievements. Newton committed to universal gravitation as a real physical force without being able to say what it was or how it acted at a distance — he famously refused to speculate, hypotheses non fingo, but the mathematical structure was so powerful and predictively successful that he committed to it anyway. The physical understanding of gravity came with general relativity, more than two centuries later. Thermodynamics was developed and successfully applied throughout the 19th century without anyone understanding what heat actually was at the molecular level — that required the statistical mechanics of Boltzmann, and even then wasn't fully accepted until the early 20th century.

Monday, June 8, 2026

From July 4, 2025

After 1990 AI stopped citing work in psychology [empirical evidence]

I'm bumping this post from 2024 to the top of the queue as it testifies to the process by which AI has converged on the intellectual monoculture it has become in the wake of ChatGPT.

* * * * * 

I’ve frequently noted that, while researchers in artificial intelligence (AI) and machine learning (ML) often have a lot to say about when their machines will approach, overtake, and even surpass human intellectual achievement, they don’t seem to know much about psychology, linguistics, and the cognitive scientists. I made an explicit argument at some length in a recent article I published in 3 Quarks Daily, Aye Aye, Cap’n! Investing in AI is like buying shares in a whaling voyage captained by a man who knows all about ships and little about whales. In making the argument the only evidence I present is anecdotal – Geoffrey Hinton and Ilya Sutskever in that article, though my beliefs on the issue are based on my reading of the current literature, which is opportunistic and by no means ‘complete,’ which, in any case, would be impossible as the literature is so large.

Now I can present a bit of systematic empirical evidence in the matter. M.R. Frank et al. undertook a bibliometric investigation of citation patters in AI and other disciplines and discovered that, while in the early years, AI interacted with other fields quite a bit, that interaction dropped off over the years. The following chart shows how AI cited other fields:

Its citation of psychology peaked in the middle 1960s and then dropped off steadily until 1990. Its citation of mathematics rose steadily through the period. That’s understandable; I have no complaint about that. The drop in citations to psychology is also understandable, but somewhat more problematic. For it implies that, when AI experts offer judgements about human cognitive capabilities, whether directly or indirectly through comparison with AI, that don’t know what they’re talking about. I suppose that last clause is a bit harsh. Perhaps it would be a bit more accurate to say something like: They don’t know any more than a bright college sophomore who’s taken a psych course or two.

Here's the article and abstract:

Frank, M.R., Wang, D., Cebrian, M. et al. The evolution of citation graphs in artificial intelligence research. Nat Mach Intell 1, 79–85 (2019). https://doi.org/10.1038/

As artificial intelligence (AI) applications see wider deployment, it becomes increasingly important to study the social and societal implications of AI adoption. Therefore, we ask: are AI research and the fields that study social and societal trends keeping pace with each other? Here, we use the Microsoft Academic Graph to study the bibliometric evolution of AI research and its related fields from 1950 to today. Although early AI researchers exhibited strong referencing behaviour towards philosophy, geography and art, modern AI research references mathematics and computer science most strongly. Conversely, other fields, including the social sciences, do not reference AI research in proportion to its growing paper production. Our evidence suggests that the growing preference of AI researchers to publish in topic-specific conferences over academic journals and the increasing presence of industry research pose a challenge to external researchers, as such research is particularly absent from references made by social scientists.

Sunday, June 7, 2026

A toy to think with: The Erector Set

What Happened to the Erector Set? | The Toy That Taught Every Boy in America to Build

From the YouTube page:

For generations, the Erector Set wasn’t just a toy — it was a hands-on introduction to engineering, creativity, and real-world building.

Long before modern STEM kits and digital learning tools, children across America learned how to think, design, and construct using metal beams, bolts, and gears. Created by A. C. Gilbert in the early 1900s, the Erector Set became a symbol of innovation and imagination, inspiring future engineers and inventors.

In this video, we explore the complete story of the Erector Set — from its invention and rise during the golden age of American toys, to its cultural impact and eventual decline in a rapidly changing world of plastic toys and digital entertainment.

Discover how it competed with brands like LEGO, why its popularity began to fade, and what ultimately happened to one of the most educational toys ever created.

From steel structures to childhood dreams, this is the story of the toy that taught a generation how to build.

If you’ve ever wondered what happened to the legendary Erector Set… this nostalgic deep dive reveals it all.

Three up close

On Steven Spielberg, three vignettes

Wesley Morris, Why We Still Need Steven Spielberg, NYTimes, June 7, 2026.

The cultural importance of movies:

Lately, the idea of a Steven Spielberg has felt endangered. For more than 50 years, his imagery has epitomized American movies, maybe even epitomized America. He has been at the center of an industry that, if it’s not dying, is certainly diminished. The sort of original movies that made Spielberg Spielberg are virtually nonexistent, even though the two major flavors that now define the industry — global box-office smash and best picture nominee — are, with Spielberg, indistinguishable (start with “Jaws,” “Raiders of the Lost Ark,” “E.T. the Extra-Terrestrial”). More than once, he inhabited both modes within one calendar year: “Jurassic Park” in the summer of 1993, for instance, then “Schindler’s List” at the end of Hanukkah, perhaps the most triumphant single-year change-up any Hollywood director has had. (He’s still the most commercially successful director ever, and he’s tied, at 13, with William Wyler for directing the most best picture Oscar nominees.)

Popular art has always bonded us to one another, no matter what might have been cleaving us apart, no matter how different our lives or how our responses to that art diverged. And Spielberg’s films have been a premium adhesive. Not only the ones he directed but the dozens of swooshing, indelibly kooky hits unleashed by Amblin Entertainment, his production company: “Poltergeist,” “Gremlins,” “The Goonies,” the “Back to the Future” trilogy, “Who Framed Roger Rabbit,” “Arachnophobia.”

Spielberg’s stardom arose from the collision of capitalism, audacity and creative vision. His movies emerged alongside the arrival of cable television and proliferating advances in personal computing and home entertainment. I watched “E.T.” at the movies, devoured it on cable, played it on my Atari and let Michael Jackson sing me a lullaby the movie inspired him to write. (Spielberg: so titanic that the other king of pop worshiped his thrillers.)

But a kind of cultural malnourishment has set in. While you once needed a pair of hands to count the major studios, we’re on the verge of barely needing one. And the best, most lucrative ideas entail microwaved nostalgia that we all know by its legal nickname: I.P. The takeovers and reheating, the obscure metrics that ensure we never quite know exactly how popular anything is, it’s dispiriting: Pac-Man eating ghosts, algorithms keeping secrets.

Working through his parent's divorce:

“For years, I was working out my mom and dad’s divorce through my stories,” he told me. The split happened when Spielberg was about 15, but the marriage had begun to fade years before. The family dissolved, too. He went off to live with his father, Arnold, a computer engineer, in Los Angeles, while his three sisters remained in Phoenix with their mother, Leah, a classical pianist who used to operate a kosher deli. But it seems that living under the same roof didn’t significantly change how remote Arnold could feel to his son.

Over dinner one night, Spielberg told me about working on “Indiana Jones and the Last Crusade” with George Lucas. The movie reunites Harrison Ford’s Indiana with his father, a Holy Grail expert played by Sean Connery. “My contribution was: ‘OK, but I want to meet Indy’s dad, and I want them to have had years of estrangement and father neglecting son because the father was a workaholic. And this story will bring them back together again.’” When Spielberg said this, he still sounded wishful and a touch sad. Watched through the lens of his childhood, his movies can seem newly forlorn, someone blowing on a birthday cake gated with stubborn candles.

For centuries, we’ve lived with a myth that genius — male genius — expresses itself as wild eccentricity or madness, that the personality warrants a cult or a harem. Spielberg disorients in that regard. I, at least, needed a moment to absorb how familiar he felt, how familial. The man who made “E.T.” was eerily reminiscent of the woman who took me to see “E.T.” Both of them have in common a special intuition to anticipate needs we don’t know we have. My mother did it for a household. For more than half a century, Spielberg has been doing the same for a planet. [...]

Spielberg hasn’t been to see a therapist since he was in college. Instead, the movies are the arena in which he has worked on some of the mysteries he couldn’t solve on his own. What we experience as sorcery is, for him, a process of exorcism. “I can’t express enough how therapeutic and healthy it is for me to keep doing this job over and over and over again,” he said deliberatively, almost as if he was feeling this out. “I work so much out through this process. So much out. I get to bleed off some of the darkness instead of letting it fester inside me. You get to let it fester inside you.”

A six year old:

As we were getting ready to head over to “Oh, Mary!” Spielberg told me a story about the time his mother’s brother, Bernard, took him and his cousin, Paul, to visit the Lincoln Memorial. It was 1952 or ’53. He would’ve been 6. The three of them climbed the steps. “Suddenly I was standing at the foot of a scary giant,” he recalled. “I remember glancing up and being so terrified I could only look at the hands.” He fixated on how they “were overhanging the armrests” and felt the urge to flee. But something held him back. “When I turned around, I looked up at his face. At this statue. Of Lincoln. A calm washed over me. An instant connection washed over me.” His fear ceased. What arrived in its place was unabated curiosity. He began to read all about Lincoln and started making silhouette cutouts of him, an obsession you can see re-enacted in the opening scene of “Minority Report,” when a child makes a paper mask of Lincoln.

There's much more at the link.