NEW SAVANNA: Follow the link to LeCunn's brainchild: V-JEPA 2 with a visual world model

Wednesday, June 11, 2025

Follow the link to LeCunn's brainchild: V-JEPA 2 with a visual world model

V-JEPA-v2 https://t.co/7eWfLcJqce
— Yann LeCun (@ylecun) June 11, 2025

This is where you'll end up:

Introducing the V-JEPA 2 world model and new benchmarks for physical reasoning

Takeaways

Meta Video Joint Embedding Predictive Architecture 2 (V-JEPA 2) is a world model that achieves state-of-the-art performance on visual understanding and prediction in the physical world. Our model can also be used for zero-shot robot planning to interact with unfamiliar objects in new environments.

V-JEPA 2 represents our next step toward our goal of achieving advanced machine intelligence (AMI) and building useful AI agents that can operate in the physical world.

We’re also releasing three new benchmarks to evaluate how well existing models can reason about the physical world from video.

Today, we’re excited to share V-JEPA 2, the first world model trained on video that enables state-of-the-art understanding and prediction, as well as zero-shot planning and robot control in new environments. As we work toward our goal of achieving advanced machine intelligence (AMI), it will be important that we have AI systems that can learn about the world as humans do, plan how to execute unfamiliar tasks, and efficiently adapt to the ever-changing world around us.

V-JEPA 2 is a 1.2 billion-parameter model that was built using Meta Joint Embedding Predictive Architecture (JEPA), which we first shared in 2022. Our previous work has shown that JEPA performs well for modalities like images and 3D point clouds. Building on V-JEPA, our first model trained on video that we released last year, V-JEPA 2 improves action prediction and world modeling capabilities that enable robots to interact with unfamiliar objects and environments to complete a task. We’re also sharing three new benchmarks to help the research community evaluate how well their existing models learn and reason about the world using video. By sharing this work, we aim to give researchers and developers access to the best models and benchmarks to help accelerate research and progress—ultimately leading to better and more capable AI systems that will help enhance people’s lives.

And so on and so forth. You'll get three videos and links to everything.