Friday, July 8, 2022

Video about Yann LeCun's “A Path Towards Autonomous Machine Intelligence” [+symbols, abstraction, memor and program]

Yann LeCun recently released his vision for deep learning, A Path Towards Autonomous Machine Intelligence, Version 0.9.2, 2022-06-27,

Abstract: How could machines learn as efficiently as humans and animals? How could machines learn to reason and plan? How could machines learn representations of percepts and action plans at multiple levels of abstraction, enabling them to reason, predict, and plan at multiple time horizons? This position paper proposes an architecture and training paradigms with which to construct autonomous intelligent agents. It combines concepts such as configurable predictive world model, behavior driven through intrinsic motivation, and hierarchical joint embedding architectures trained with self-supervised learning.

Yannic Kilcher has uploaded a useful video explaining it:

Yann LeCun's position paper on a path towards machine intelligence combines Self-Supervised Learning, Energy-Based Models, and hierarchical predictive embedding models to arrive at a system that can teach itself to learn useful abstractions at multiple levels and use that as a world model to plan ahead in time.


0:00 - Introduction
2:00 - Main Contributions
5:45 - Mode 1 and Mode 2 actors
15:40 - Self-Supervised Learning and Energy-Based Models
20:15 - Introducing latent variables
25:00 - The problem of collapse
29:50 - Contrastive vs regularized methods
36:00 - The JEPA architecture
47:00 - Hierarchical JEPA (H-JEPA)
53:00 - Broader relevance
56:00 - Summary & Comments

Self-driving car, abstraction, and symbols

During his explanation Kilcher uses learning from a videotape as his main example: The machine is presented with a videotape. What are its options in predicting the future course of the tape? That’s an entirely reasonable example, but it is also worlds away from the sorts of things I’ve thought about, literary texts and abstract ideas.

I bring this up because, as I recall, LeCun’s most frequent example is a self-driving car (I haven’t actually counted instances), which is obviously a highly salient example, but hardly representative of the full range of human reasoning. At the end there LeCun has a brief discussion of symbols (p. 47):

In the proposed architecture, reasoning comes down to energy minimization or constraint satisfaction by the actor using various search methods to find a suitable combination of actions and latent variables, as stated in Section 3.1.4.

If the actions and latent variables are continuous, and if the predictor and the cost modules are differentiable and relatively well behaved, one can use gradient-based methods to perform the search. But there may be situations where the predictor output changes quickly as a function of the action, and where the action space is essentially discontinuous. This is likely to occur at high levels of abstractions where choices are more likely to be qualitative. A high-level decision for a self-driving car may correspond to “turning left or right at the fork”, while the low-level version would be a sequence of wheel angles.

If the action space is discrete with low cardinality, the actor may use exhaustive search methods. If the action set cardinality, and hence the branching factor, are too large, the actor may have to resort to heuristic search methods, including Monte-Carlo Tree Search, or other gradient-free methods. If the cost function satisfied Bellman’s equations, one may use dynamic programming.

But the efficiency advantage of gradient-based search methods over gradient-free search methods motivates us to find ways for the world-model training procedure to find hierarchical representations with which the planning/reasoning problem constitutes a continuous relaxation of an otherwise discrete problem.

A remain question is whether the type of reasoning proposed here can encompass all forms of reasoning that humans and animals are capable of.

Yes, it is a question. Without a robust capability for dealing with symbols no model is going to “encompass all forms of reasoning that humans ... are capable of,” though perhaps Yann’s proposal will encompass animal reasoning. We’ll see.

I’m left with the persistent feeling that none of these researchers have thought seriously about language or mathematics, despite the success of large language models. In effect, they leave thinking about language to their models. The existence of large bodies of text allows them to treat texts as the objects of analog, and therefore differentiable, perception – something well worth thinking about from a theoretical point of view. All they think about are their gradient-based architectures. Reasonable enough, I suppose, but it’s no way to scale Mount AGI. 

On the separation of memory and program

Addendum, 6.11.22: Tim Scarfe has a long discussion of LeCun's paper in the first hour or so of this video:

Starting about 2:17:14 Keith Duggar makes a point from a 1988 paper, Connectionism and Cognitive Architecture: A Critical Analysis [Fodor, Pylyshyn], pp. 22-23:

Classical theories are able to accommodate these sorts of considerations because they assume architectures in which there is a functional distinction between memory and program. In a system such as a Turing machine, where the length of the tape is not fixed in advance, changes in the amount of available memory can be affected without changing the computational structure of the machine; viz by making more tape available. By contrast, in a finite state automaton or a Connectionist machine, adding to the memory (e.g. by adding units to a network) alters the connectivity relations among nodes and thus does affect the machine’s computational structure. Connectionist cognitive architectures cannot, by their very nature, support an expandable memory, so they cannot support productive cognitive capacities. The long and short is that if productivity arguments are sound, then they show that the architecture of the mind can’t be Connectionist. Connectionists have, by and large, acknowledged this; so they are forced to reject productivity arguments.

Duggar and Scarfe agree that this is a deep and fundamental issue. A certain kind of very useful abstraction seems to depend on separating the computational procedure from the memory on which it depends. Scarfe (2:18:40): "LeCun would say, well if you have to handcraft the abstractions then learning's gone out the window." Duggar: "Once you take the algorithm and abstract it from memory, that's when you run into all these training problems."

I wonder about plasticity in natural neural systems? Does that have any bearing on this issue? 

Prior art

Also: Jürgen Schmidhuber has posted a comment to LeCun's paper which contains the following:

I want to acknowledge that I am not without a conflict of interest here; my seeking to correct the record will naturally seem self-interested. The truth of the matter is that it is. Much of the closely related work pointed to below was done in my lab, and I naturally wish that it be acknowledged, and recognized. Setting my conflict aside, I ask the reader to study the original papers and judge for themselves the scientific content of these remarks, as I seek to set emotions aside and minimize bias so much as I am capable.

TL;DR: Years ago we published most of what LeCun calls his "main original contributions:” (I) our "cognitive architectures in which all modules are differentiable and many of them are trainable” (1990-), (II) our "hierarchical architecture for predictive world models that learn representations at multiple levels of abstraction and multiple time scales” (1991-), (III) our "self-supervised learning paradigm that produces representations that are simultaneously informative and predictable” (since 1997 for reinforcement learning/world models), and (IV) our predictive models "for hierarchical planning under uncertainty,” including gradient-based neural subgoal generators (1990-), reasoning in abstract concept spaces (1997-), neural nets that "learn to act largely by observation" (2015-), and learn to think (2015-).

More details and numerous references to the original papers can be found under

No comments:

Post a Comment