On June 16, 2022, the Meta AI unveils new scientific research that will aid AI in understanding the physical world more flexibly and efficiently. Meta AI researchers state that AI systems must learn to navigate the complexity of the physical world. In the same way, people create and interact with immersive new experiences in the metaverse — where people can navigate virtual realms and our physical world through augmented reality.

For example, AR glasses showing us where we left our keys require essential new technologies that let AI understand the layout and dimensions of the unknown, ever-changing settings without using high-computing resources such as pre-loaded maps. We don't need to know the exact location or length of our coffee table to move around it without banging into its corners as people, for example (most of the time).

Research outcome

All of this work helps progress in the visual navigation of embodied AI, a field of research focusing on training AI systems through interactions in 3D simulations instead of traditional 2D datasets.

  • Researchers have made a point-goal navigation model that can find its way around a new area without a map or GPS sensor. Habitat 2.0, our state-of-the-art embodied AI platform that runs simulations many times faster than real-time, was used by the researchers to do this.
  • Researchers have made and released Habitat-Web, a training data collection of over 100K different human demonstrations for object-goal navigation methods, to improve the training even more without using maps. For each human demonstration, a paid Mechanical Turk user is given a task (like "find the chest of drawers") and uses a web browser interface on their computer to control the virtual robot.
  • In a new zero-shot experience learning framework, the researchers have made the first "plug and play" modular approach that helps robots adapt to different semantic navigation tasks and goal modalities without retraining.
  • Furthermore, researchers are still trying to make things more efficient. They have developed a new way to do object-goal navigation tasks that gets state-of-the-art results while cutting training time by 1,600 times compared to older methods.

Navigating without GPS

Researchers from Ukrainian Catholic University, 2Georgia Institute of Technology, and Meta AI have developed new ways to improve visual odometry, which is how AI can determine where it is based only on what it sees. Their new data-enhancement method trains simple but effective neural models without adding human annotations to the data. As a result, robust visual odometry is all you need to move state-of-the-art from 71.7 per cent success on the Realistic PointNav task without GPS or compass data to 94 per cent success, even when the action dynamics are noisy.

Even though their method doesn't solve this dataset entirely, this research shows that explicit mapping may not be for navigation, even in real-world settings.

Zero-shot navigation learning

Most embodied AI developments perform well on discrete, well-defined tasks depending on objective type (e.g., "identify an object," "navigate to a room") or modality when it comes to training AI to find objects (e.g., text, audio). On the other hand, agents must be able to modify their skills on the go in the actual world without using resource-intensive maps or lengthy retraining processes.

Image source: Meta AI

Researchers from The University of Texas at Austin and Meta AI have created a model. Their model captures the essential skills for semantic visual navigation and then applies them to different target activities in a 3D environment without additional retraining in a first-of-its-kind zero-shot experience learning (ZSEL) framework.

Conclusion

Researchers looked into the question, "Can a self-guided agent move around in a new environment without making a map?" in (simulated) real-life situations. To answer this question, researchers first showed that when given ground-truth localization (GPS+Compass), map-less agents can overcome actuation and sensor noise and learn to navigate with near-perfect performance. This approach revealed that localization is the limiting factor.

Soon, researchers plan to apply these navigational breakthroughs to mobile manipulation to create agents that perform specific tasks, such as "find my wallet and bring it back to me." In addition, they claimed that they would undertake a variety of new and intriguing challenges: How does this simulation work translate to practical robots? How can an embodied agent self-supervise its learning without human intervention in reward engineering, demonstrations, or 3D annotations? How can simulation be scaled to the next level of simulation and learning speed?

Image source: Unsplash

Want to publish your content?

Publish an article and share your insights to the world.

Get Published Icon
ALSO EXPLORE