Results for ""
Researchers affiliated to Google, Microsoft, Facebook, Carnegie Mellon, the University of Toronto, the University of Pennsylvania, and the University of California, Berkeley have come up with a self-supervised AI that leverages planning to tackle previously unknown goals. Termed as Plan2Explore, the algorithm was trained without human interaction nor a specific task to perform and is a better version of the previous method.
According to Turing Award winners, Yoshua Bengio and Yann LeCun self-supervision is the key to human-level intelligence. Plan2Explore has achieved this to a level as it learns to complete new tasks without specific training on those tasks.
To guarantee that the algorithm is efficient, Plan2Explore quantifies the uncertainty on various predictions. This prompts the system to check out areas and within the environment with high uncertainty, and Plan2Explore is trained to reduce the prediction uncertainties. The process is repeated so that Plan2Explore optimizes from trajectories it predicted.
Unlike learning algorithms that are developed with supervision and trained with specific data sets, Plan2Explore generate labels from data by revealing similarities between the data’s parts. They observe the world and interact with it by observation, not with testing. The algorithm learns from its environment and collates the data. With this model, Plan2Explore derives behaviours from it using Dreamer, a DeepMind-designed algorithm that plans ahead to select actions by anticipating their long-term outcomes. The program also receives reward functions which means how the intelligence has to adapt to multiple tasks such as standing, walking, running and so on. Experimenting using the DeepMind Control Suite, Plan2Explore reached goals without a piece of particular goal-specific information.
“Reinforcement learning allows solving complex tasks; however, the learning tends to be task-specific, and the sample efficiency remains a challenge,” say the researchers in a paper describing the algorithm. “By presenting a method that can learn effective behaviour for many different tasks in a scalable and data-efficient manner, we hope this work constitutes a step toward building scalable real-world reinforcement learning systems.”