From DeepMind's blog, Generally capable agents emerge from open-ended play:
In recent years, artificial intelligence agents have succeeded in a range of complex game environments. For instance, AlphaZero beat world-champion programs in chess, shogi, and Go after starting out with knowing no more than the basic rules of how to play. Through reinforcement learning (RL), this single system learnt by playing round after round of games through a repetitive process of trial and error. But AlphaZero still trained separately on each game — unable to simply learn another game or task without repeating the RL process from scratch. The same is true for other successes of RL, such as Atari, Capture the Flag, StarCraft II, Dota 2, and Hide-and-Seek. DeepMind’s mission of solving intelligence to advance science and humanity led us to explore how we could overcome this limitation to create AI agents with more general and adaptive behaviour. Instead of learning one game at a time, these agents would be able to react to completely new conditions and play a whole universe of games and tasks, including ones never seen before.
Today, we published "Open-Ended Learning Leads to Generally Capable Agents," a preprint detailing our first steps to train an agent capable of playing many different games without needing human interaction data. We created a vast game environment we call XLand, which includes many multiplayer games within consistent, human-relatable 3D worlds. This environment makes it possible to formulate new learning algorithms, which dynamically control how an agent trains and the games on which it trains. The agent’s capabilities improve iteratively as a response to the challenges that arise in training, with the learning process continually refining the training tasks so the agent never stops learning. The result is an agent with the ability to succeed at a wide spectrum of tasks — from simple object-finding problems to complex games like hide and seek and capture the flag, which were not encountered during training. We find the agent exhibits general, heuristic behaviours such as experimentation, behaviours that are widely applicable to many tasks rather than specialised to an individual task. This new approach marks an important step toward creating more general agents with the flexibility to adapt rapidly within constantly changing environments.
When you look under the hood:
Analysing the agent’s internal representations, we can say that by taking this approach to reinforcement learning in a vast task space, our agents are aware of the basics of their bodies and the passage of time and that they understand the high-level structure of the games they encounter. Perhaps even more interestingly, they clearly recognise the reward states of their environment. This generality and diversity of behaviour in new tasks hints toward the potential to fine-tune these agents on downstream tasks. For instance, we show in the technical paper that with just 30 minutes of focused training on a newly presented complex task, the agents can quickly adapt, whereas agents trained with RL from scratch cannot learn these tasks at all.
By developing an environment like XLand and new training algorithms that support the open-ended creation of complexity, we’ve seen clear signs of zero-shot generalisation from RL agents. Whilst these agents are starting to be generally capable within this task space, we look forward to continuing our research and development to further improve their performance and create ever more adaptive agents.
Preprint of technical paper: Open-Ended Learning Leads to Generally Capable Agents.
Also, check out the comments a Less Wrong.