DeepMind has published a paper showing their early work on video game AIs that can understand fuzzy human notions and communicate with humans on their terms.

Most experts in artificial intelligence (AI) today feel it is impossible to write computer code that can capture the complexities of situated interactions. In contrast, recent machine learning (ML) researchers have emphasized the data-driven discovery of these types of connections. The researchers established a research framework within a video game environment to investigate these learning-based approaches and rapidly develop bots that can interpret human commands and safely execute actions in unstructured settings.

Much of the recent development in training video game AI is based on optimizing a game's score. Instead, they used the clear-cut wins/losses measured by computer code to train powerful AI agents like StarCraft and Dota. Instead of optimizing a game score, the researchers urge participants to create their assignments and rate their progress.

The researchers used this method to create a study paradigm that allows us to improve agent behaviour through grounded and open-ended contact with humans. This paradigm, which is still in its early stages, develops agents that can listen, talk, ask questions, navigate, search and retrieve, manipulate things, and do various other tasks in real-time.

The playhouse

The framework begins with players engaging with one another in the virtual world of a video game. Then, using imitation learning, researchers instilled agents with a large but unrefined set of behaviours. This "behaviour prior" is essential for interactions that people may evaluate. Without this initial phase of imitation, agents would be completely random and nearly hard to deal with. Further human evaluation of the agent's behaviour and optimization of these evaluations using reinforcement learning (RL) results in improved agents, which they may further enhance.

Operation

Initially, the researchers created a rudimentary video game environment based on the concept of a "playhouse" This system provided a secure context for human-agent interactions and made it simple to quickly collect vast volumes of interaction data. Various rooms, furnishings, and artefacts were arranged differently for each encounter in the house. Additionally, the researchers designed an interface for interaction.

The human and the agent have avatars in the game to move and affect the world. They can also engage in activities such as transporting objects and handing them to one another, building a block tower, and cleaning a room together. Human participants establish interaction settings by navigating the world, establishing objectives, and posing inquiries to agents. The research accumulated more than 25 years of real-time interactions between agents and hundreds of (human) participants.

Evaluation

The trained agents are capable of many tasks, some of which the researchers who created them did not foresee. For example, the researchers discovered that these agents may construct rows of objects using two alternating colours or fetch an object from a residence that resembles an object the user is now holding.

As a result of the synthesis of simple meanings, language enables an almost limitless variety of jobs and inquiries. In addition, as researchers, they need to explain the specifics of agent behaviour. Instead, the tens of thousands of persons that engage in conversations generated tasks and questions during these exchanges.

Conclusion

Recent tests demonstrate that the RL process can be iterated to enhance agent behaviour continuously. This strategy resulted in increasingly capable agents. For specific sorts of complex instructions, it would be possible to develop agents that outperform humans on average.

The concept of teaching AI with human preferences as a reward has existed for quite some time. In Deep reinforcement learning from human preferences, pioneering ways to align neural network-based agents with human preferences were developed. Recent research on the development of turn-based dialogue agents investigated similar concepts for training RL-capable assistants using human feedback. Their research has modified and developed these approaches to create adaptable AIs that can master a wide range of multimodal, embodied, real-time interactions with humans.

Want to publish your content?

Publish an article and share your insights to the world.

Get Published Icon
ALSO EXPLORE