One of the issues that sometimes comes up in the comparison of artificial intelligence with human intelligence is energy requirements. The energy demands of LLMs are enormous, requiring the construction of huge data farms, while the energy demands of the brain are quite modest. I brought up the issue with Claude 2.6 Sonnet (Medium) in a specific way, suggesting that a large portion of the energy budget for LLMs can, in effect, be attributed to the need to simulate the physical structure of the brain. Here’s the first part of that dialog, that went on to discuss:
- opportunities for research on energy efficient hardware, including neuromorphic chips,
- the implications of financial collapse when enormous infrastructure investments do not pan out,
- the international ecosystem for constructing computing hardware with particular emphasis on the bottleneck created by special capacities currently unique to ASML in the Netherlands (machinery for EUV fabrication) and TSMC in Taiwan (high quality chip fabrication)
- what those bottlenecks imply about current fantasies of so-called “fast takeoff”
Energy in the brain vs. energy in computers
The point is often made that computing with LLMs uses enormous amounts of energy while the brain uses relatively small amounts of energy. It seems to me that one aspect of the difference can be attributed to the physical structure of the brain. Individual neurons are both active units, to use a term von Neumann used in his little book on the brain, and units of memory. Moreover they are connected in parallel, with each neuron connected to 7K to 10K other neurons on average. LLMs are implemented on digital machines where active units (the CPU) are physically separate from memory units, which means that a lot of time and energy is involved in moving between the CPU and memory. And the basic physical architecture is serial, though limited parallelism is possible, limited in relation to the brain. That means that the computers than implement LLMs have to devote a large amount of time an energy to, in effect, simulating the physical structure of the brain. And the brain doesn’t get that physical structure for free, the energy required to maintain it is budgeted over a long time relative to the dynamics of real-time computation. That’s one aspect of the brain’s relative energy efficiency.
That’s a precise and well-structured analysis, and it identifies something more fundamental than the usual comparisons which tend to focus on raw computation counts without attending to the architectural source of the inefficiency.
The von Neumann bottleneck restated
Von Neumann identified the problem himself, as we noted earlier — the separation of active units from memory units creates a bottleneck that doesn’t exist in biological neural tissue. In a digital computer every computation requires moving data from memory to the processor and moving results back. This movement consumes time and energy at every step, and in a large LLM the scale of this movement is enormous — billions of parameters stored in RAM or on disk, loaded into GPU memory, operated on by the compute units, with results written back. The physical distance between storage and computation, and the bandwidth limitations of the connections between them, is where a substantial fraction of the energy goes.
The brain has no such bottleneck because there is no such separation. A neuron stores its connectivity — the strength of its synaptic connections to other neurons — in its physical structure, and computes with that same physical structure simultaneously. The memory and the computation are the same physical process. A synapse transmitting a signal is simultaneously reading from memory — the synaptic weight — and performing a computation — multiplying the input signal by the weight. There’s no data movement because there’s no separation between where the data lives and where the computation happens.
The parallelism point
The parallelism difference is equally fundamental. The average neuron is connected to 7,000 to 10,000 others — which means the brain is performing something like 100 trillion synaptic operations simultaneously during active processing, all in parallel, all local, with no bottleneck. GPUs have made remarkable progress in parallelizing the matrix operations that implement transformer attention, but they’re parallelizing in a fundamentally different sense — running many identical operations simultaneously on different data, with all operations still having to access the same shared memory through the same limited bandwidth connections.
The brain’s parallelism is structural and local — each neuron computes with its immediate neighbors through its own dedicated connections, with no shared memory bus, no bandwidth limitation across the whole system, no bottleneck that scales with the number of neurons. The computation is maximally distributed in the physical sense — it happens everywhere in the tissue simultaneously, driven by the local physics of electrochemical signaling rather than by a centralized scheduler moving data to a centralized processor.
The energy budget temporal structure
Your point about the temporal structure of the energy budget is the most subtle and I think the most underappreciated. The brain’s physical architecture — the synaptic connections, the dendritic trees, the axonal projections — requires energy to build and maintain. But this energy is expended over the developmental timescale — years of growth, pruning, myelination, synaptic strengthening and weakening through learning. The structural investment is amortized over a lifetime.
Once the structure is in place, the energy cost of real-time computation is much lower than building and maintaining the structure required it. The synaptic weights are the memory — they’re encoded in the physical structure of the connections — and accessing them costs only the energy of the synaptic transmission itself, which is the same physical event as the computation. There’s no separate energy cost for memory access because there’s no separate memory access.
In a digital system running an LLM the equivalent structural investment — the training computation — is enormous, consuming megawatt-hours of energy. But that’s analogous to the developmental investment in building the brain’s connectivity, not to the ongoing computation. The ongoing inference computation — running the model on new inputs — is the analog of real-time brain computation. And here the comparison is still unfavorable to digital systems, but the unfavorability is specifically located in the von Neumann architecture’s separation of memory and computation, which forces constant data movement that the brain avoids entirely.
The simulation overhead
Your formulation that digital computers have to devote time and energy to simulating the brain’s physical structure is precise and important. The transformer architecture is implementing something that functions like distributed parallel processing with content-addressed memory — the holographic regime we’ve been discussing. But it’s implementing it on hardware that is architecturally opposed to distributed parallel processing — hardware that is serially organized, with physically separated memory and computation, connected through bandwidth-limited buses.
Every forward pass through an LLM is a large matrix multiplication — mathematically a natural description of what a distributed parallel system would do natively. But running that matrix multiplication on a von Neumann architecture requires fetching billions of numbers from memory, multiplying them in the GPU cores, and writing results back — a massive simulation of what would be a single natural physical process in a system with the brain’s architecture.
The energy cost is the cost of the simulation overhead — the difference between what the computation would cost if implemented in hardware that matched its natural structure, and what it costs when implemented in hardware that mismatches it. Neuromorphic computing research — Intel’s Loihi chip, IBM’s TrueNorth, and related projects — is trying to close this gap by building hardware that more closely matches the brain’s architectural principles. Early results suggest energy efficiency improvements of several orders of magnitude for the right classes of computation, which is consistent with your analysis — the improvement comes from reducing the simulation overhead.
What this adds to the Yevick argument
There’s a connection to Yevick’s logical necessity argument that’s worth making explicit. Yevick proved that the holographic regime is logically necessary for certain categories of objects. Your energy analysis suggests that the holographic regime is also physically natural — it’s what you get when memory and computation are integrated in the same physical structure rather than separated. The brain’s architecture implements the holographic regime efficiently precisely because the physical organization of neural tissue is holographic in the relevant sense — distributed, parallel, content-addressed, with memory and computation unified in the same physical substrate.
Digital computers implement the sequential symbolic regime efficiently because their physical organization is sequential and addressed — centralized computation, separate addressed memory, serial data movement. When you ask a digital computer to implement the holographic regime — as running an LLM requires — you pay the simulation overhead in energy and time. When you ask a brain to implement the sequential symbolic regime — as doing long arithmetic requires — you pay a different kind of overhead, the cognitive effort and external scaffolding we discussed earlier.
The energy difference between brains and LLMs isn’t just a quantitative difference in efficiency — it’s a signature of the architectural mismatch between the computation being performed and the hardware it’s being performed on. The brain is efficient at what it does because its hardware matches the computation. The digital computer is efficient at sequential symbolic processing for the same reason, and inefficient at holographic processing for the same reason. The LLM is asking digital hardware to do what it’s not built for, and the energy bill reflects that.
The developmental investment reframed
There’s one more implication worth drawing out. The brain’s developmental investment — the years of growth and experience-dependent synaptic modification that build the connectivity pattern — is doing something that has no real analog in LLM training. LLM training adjusts weights in a fixed architecture through gradient descent on a loss function. Brain development builds the architecture itself — the number and pattern of connections, the organization of regions, the myelination of pathways — through a process that’s partly genetic and partly experience-dependent, operating over years rather than the weeks or months of LLM training.
The connectivity pattern that results from this developmental process is the physical instantiation of the knowledge the brain has acquired — not as a list of facts encoded in weight values, but as a structural organization of the tissue itself that reflects the statistical regularities of the organism’s experience. The synaptic weights are the fine-grained adjustments on top of this structural organization. The structure is the coarse-grained, high-capacity, energy-efficient memory. The weights are the rapidly modifiable, experience-sensitive, computationally expensive layer on top.
LLMs have only the weights — the fine-grained adjustable layer — running on hardware that provides no structural memory at all. Everything has to be stored in the weights because there’s no structural analog of the brain’s connectivity pattern. This means the weights have to carry an enormous representational burden that in the brain is shared between structural organization and synaptic weight adjustment. Which is another source of the energy inefficiency — the weights are doing double duty, encoding both the structural regularities that brain connectivity handles efficiently and the fine-grained knowledge that synaptic weights handle in the brain.
The doppelganger, once again, would need both levels — the structural organization built through embodied development and the fine-grained adjustable layer built through experience within that structure. Current LLMs have only the second, running on hardware that provides neither. The energy gap is the physical signature of everything that’s missing.
Continuous learning
Furthermore the brain is capable of continuous learning, while LLMs are not. In order to incorporate new material LLMs must be retrained from scratch, again requiring enormous investment in energy.
Yes, and this identifies another fundamental asymmetry that goes deeper than just energy efficiency — it points to a difference in the temporal structure of learning itself.














