NEW SAVANNA: Demis Hassabis and Yann LeCun on Computational Compressibility

Dædalus currently has a double issue, AI & Science: What Is the Future of Discovery?, edited by James M. Manyika. Manyika interviews both Hassabis and LeCun and they offer remarks relevant to the issue of computational compressibility as I discussed it in my recent working paper, On Method: Computational Compressibility in Complex Natural and Cultural Phenomena, though they don’t use the term. Here are some passages from those interviews.

Demis Hassabis

In this first passage Hassabis is talking about a well-known problem in computer science, known as P versus NP, which is about how long it takes to solve a problem as a function of the size of the input. Roughly speaking, what’s at stake goes like this: If you are presented with a proposed solution and can verify it quickly, could you also find the solution from scratch relatively quickly (in polynomial time, P) or is finding a solution so difficult as to be all but impossible (NP, Nondeterministic Polynomial time). You don’t need to understand that to understand this passage , pp. 36-38:

It does, and I think those are the interesting limits to test and understand. P equals NP–which attempts to categorize the difficulty of a problem by how much computation it would take to find and check a solution, respectively–is one of the most important questions in science to resolve. I suspect P is not equal to NP, and there are some problems out there that are just not tractable to solve in a practical amount of time without invoking the help of, say, a quantum computer, but we need to understand this a lot better because there may be more nuance here than we previously realized. In our work with AlphaGo and AlphaFold, we’re showing that if you do a lot of precompute, which is not normally considered in these kinds of scenarios, you can seemingly answer some highly complex questions approximately optimally in P (polynomial) time. Neural networks are effectively using massive amounts of precompute to compress knowledge into some efficient artifact. That computed artifact is then available at test time and, for a lot of natural systems, you can use it to narrow down your search space so you don’t have to consider all the possible configurations they could potentially take, but only a much smaller subset that are actually plausible.

Those last two sentences are about computational compressibility. Hassibis then goes on to illustrate:

Let’s take proteins. There are roughly 10300 possible conformations of an average protein. It would take longer than the age of the universe to enumerate that exhaustively to find the one specific shape it takes, so you have to do something much smarter. You have to learn what patterns there are for different amino acid sequences and then only search a tiny fraction of the possibilities to find the approximately correct solution. That seems to be what we managed to do with AlphaFold. Maybe not perfectly, but to an approximation that is at least good enough for practical purposes. [...]

AlphaFold was our solution to the protein folding or protein structure prediction problem. You start with an amino acid sequence–you can think of it very roughly as the genetic sequence for the protein, a one-dimensional string of letters. In the body or in nature, that string folds up into a 3D structure, and that shape goes a long way toward defining the function of that protein, which is really important for drug discovery and disease understanding.[...]

The way we did it is that there were about 150,000 known structures that had been painstakingly put together by structural biologists over the past thirty to forty years with very expensive equipment like electron microscopes. That was just about enough data to give our AI system clues as to the topology of proteins. Of course they don’t just fold up randomly; there are some constraints, and the AI system learned them. Eventually it was able, within a few seconds, to come up with a plausible structure for an unseen protein.

In this next passage, the first two conditions are about compressibility, p. 39:

We look for three aspects of a problem in determining whether it is suitable to tackle with the AI techniques we have today. First, can the problem be described as or converted into a description of a massive combinatorial space? Perhaps it’s intractably large and normal brute force techniques won’t work. Second, if that’s true, do you have enough data to learn some sort of model of the topology of that space? Or maybe a simulator is available or learnable that can generate some additional synthetic data. Ideally, you have both. Third, you need a clear objective that you’re trying to minimize or maximize. In games, that is winning or maximizing the score. In a natural system, that might be minimizing the free energy in that system. If you can quantify that, you can then use a model to search with the guidance of the objective function toward the optimal solution.

Yann LeCun

In the following passage LeCun talks about an abstract representation space. That space contains a compressed representation of the phenomenon, pp. 47-48

I think this is a crucial point and is what I am presently devoting all of my efforts to: devising AI systems that can find an abstract representation of the phenomenon and make predictions in that abstract representation space. This abstract representation eliminates a lot of details about the original observations. And that’s a crucial point because LLMs (large language models) and other generative models are trained to predict every detail of the input. In language, it’s not too much of a problem. You cannot predict exactly which word follows a sequence of words, but you can produce a probability distribution over words. That’s easy because there’s a finite number of possible words. But when you train the model to predict future frames in a video, you can’t represent a useful distribution. You have to make predictions in an abstract representation space, not at the pixel level. So a lot of people in the last few years instinctively said, “let’s just tokenize the world.” Let’s take images from videos and cut them into little squares and turn that into a vector that doesn’t look different from the one that represents a word, and feed this to a gigantic model to predict the next few frames. Frankly, it doesn’t work that well. The reason why is that you simply cannot predict what’s going to happen in a video at the pixel level. There are so many details that are just not in the input. We don’t know how to produce a probability distribution over all possible video frames because it’s mathematically intractable. It’s a problem people have struggled with for decades in statistical physics.

Instead, what we do as scientists is to find a representation of the input that eliminates all the details we cannot predict, and we make predictions in that representation space. That’s not a generative architecture.

Later, p. 55:

Manyika. Given the advances in AI, and particularly if we go beyond human cognitive levels and AI systems come to understand more than we do, what are the implications for philosophy of science, how we do science, and the nature of scientific understanding?

LeCun. I think that question is not a new one. When we solved PDEs (partial differential equations) numerically with computers, did the computational fluid dynamics simulator understand physics better than we did? It can make a prediction and it’s using an algorithm based on equations that humans came up with.

The next step AI enables is training a machine-learning system to make predictions from data without the manual step of reducing the process to equations. AI allows us to skip having to first build a model of reality that can then be computed. This is powerful because many phenomena in science are collective complex phenomena.

That is the compression step. LeCun continues:

A pile of sand behaves in a particular way, and the theory for this is not entirely clear. The property of materials, particularly complex ones, cannot be directly derived from the elementary equations of quantum mechanics. It’s just too complicated. Another example is the magic angle, 1.1 degrees, at which you rotate two stacked monolayers of carbon, called graphene, to form a superconductor. That’s a collective phenomenon that is extremely difficult to explain. There are various properties of materials of this type that cannot be usefully reduced to a small number of equations from which you can derive this collective behavior. How does intelligence emerge from neurons in interaction? That’s a philosophical question of how a super complex property like intelligence can emerge from a large number of relatively simple elements in interaction, but that’s a pretty high-level thing. At a lower level are questions of how life emerges from the interaction between proteins. This transition is what has baffled scientists for a long time: the transition from the microscopic to the mesoscopic. This is where interesting things happen, like life, for example.

So now there’s a new way of doing science, which is neither completely qualitative and observational nor reductionist, but is a data-driven, AI-powered phenomenological model that may allow us to bridge the gap between microscopic and macroscopic.

That last paragraph is about compression.

NEW SAVANNA

Pages in this blog

Saturday, June 6, 2026

Demis Hassabis and Yann LeCun on Computational Compressibility

No comments:

Post a Comment