Agent57 outperforms humans in all Atari games

Pillars
IndiaAI Portal
Resources
Ecosystem
Sectors

Back

Results for ""

IndiaAI Recommends

Get featured on IndiaAI

Contribute your expertise or opinions and become part of the ecosystem!

Google’s DeepMind develops the first Artificial Intelligence (AI) system to outperform humans in all the 57 Atari games while overcoming a wide range of difficulties, characteristics and strategic styles. The AI, a deep reinforcement learning algorithm, gets its name, Agent57, due to this unprecedented feat. It uses meta-controller along with an algorithm for efficient exploration and adapt to both - long-term and short-term behaviours of the agent. The findings announced in the form of an extensive blog and pre-reviewed paper published on DeepMind’s website can possibly be the basis on which enterprises could deploy workplace AIs - to do their mundane but essential tasks such as data entry, but also reasons with its environments - thus boosting an enterprise’s productivity.

While measuring AI problem-solving capabilities through video games may seem childish or geeky, the medium actually provides a variety of challenges that can be solved using a combination of strategies, memory and exploration. For these reasons, the Arcade Learning Environment - Atari 2600 suite of 57 games aka Atari 57- was established as a benchmark in 2012 and is widely used by the research community to measure the progress of AI agents.

Agent57 is the product of a long line of AI ancestors as DeepMind’s quest to master the Atari 57 started in 2012. The Deep-Q network (DQN) was the first AI by DeepMind to beat a handful of Atari games in 2012. While DQN and several of the other similar AIs consistently won at the easier Atari games, four Atari games - Montezuma’s Revenge, Pitfall, Solaris and Skiing - proved tough to beat.

“Montezuma’s Revenge and Pitfall require extensive exploration to obtain good performance….Solaris and Skiing are long-term credit assignment problems: in these games, it’s challenging to match the consequences of an agents’ actions to the rewards it receives. Agents must collect information over long time scales to get the feedback necessary to learn,” states the blog.

This means that to win at these games, the AI was presented with a set of problems that required experimentation as well as memorisation, added with decision-making skills to choose either of the approaches. Thus the researchers updated the DQN system with several improvements, including a form of memory that lets it base decisions on things it has previously seen in the game and reward systems that encourage the AI to explore its options more fully before settling on a strategy.

“To achieve Agent57, we combined our previous exploration agent, Never Give Up, with a meta-controller. This agent computes a mixture of long and short term intrinsic motivation to explore and learn a family of policies, where the choice of policy is selected by the meta-controller. The meta-controller allows each actor of the agent to choose a different trade-off between near vs. long term performance, as well as exploring new states vs. exploiting what’s already known (Figure 4). Reinforcement learning is a feedback loop: the actions chosen determine the training data. Therefore, the meta-controller also determines what data the agent learns from,” write the researches explaining how Agent57 balances the trade-offs between when to explore and when to exploit, as well as what time-horizon it would be useful to learn with.

When Agent57 was put to the test with its predecessors, namely Recurrent Replay Distributed DQN (R2D2) and MuZero, it came out a clear winner with a more consistent performance across all games. However, Agent57 has massive scope for improvement according to its creators. Currently, the algorithm is resources-heavy making it extremely computationally expensive to run. Additionally, while it has amazing performance, it complicated games its performance across simple games, even when compared to the performance of simple AI algorithms, was sub-par. “This by no means marks the end of Atari research, not only in terms of data efficiency but also in terms of general performance … Key improvements to use might be enhancements in the representations that Agent57 uses for exploration, planning, and credit assignment,” wrote the authors on the blog.

Want to publish your content?

Publish an article and share your insights to the world.

IndiaAI Recommends