Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and Meta have introduced a novel training method for general-purpose robots inspired by large language models (LLMs) like GPT-4.

This groundbreaking approach demonstrated through a robotic arm feeding a dog, Momo, simplifies and speeds up the training process by utilizing an extensive, diverse dataset. Unlike conventional techniques that rely on task-specific data, this method aligns multiple data types and sources, enabling robots to learn a broad range of skills efficiently.

Typically, robot training involves gathering data specific to each task and robot model, a labour-intensive and costly process that limits the robot’s adaptability to new environments. MIT’s approach, however, merges varied data from multiple modalities, such as vision sensors and arm position encoders, into a unified “language” that a generative AI model can interpret. This method reduces training costs and time and outperforms traditional task-specific training by over 20 per cent in simulations and real-world applications.

The core innovation, Heterogeneous Pretrained Transformers (HPT), incorporates transformers—a model structure underpinning many LLMs—that can process varied data like visual images and proprioceptive signals, tracking a robot’s position and speed. By transforming this data into tokens, the transformer maps all inputs into a shared representation space, enabling it to transfer pre-learned skills to new tasks with minimal additional training. This unification model enhances performance; the larger the model grows, the more capable it becomes.

One significant challenge MIT researchers faced was creating a robust dataset for pretraining. Their efforts led to a dataset of 52 sources with over 200,000 robot trajectories across four categories, including human demonstration videos and simulations. The HPT model successfully adapted to various tasks, even those dissimilar from its pretraining data. Researchers plan to explore how diverse data could enhance HPT further, including processing unlabeled data to increase adaptability.

This research, partially funded by the Amazon Greater Boston Tech Initiative and the Toyota Research Institute, represents a step towards more adaptive, multi-skilled robots capable of learning from vast, heterogeneous datasets. With HPT, robots are positioned to perform more complex, varied tasks, marking a significant advancement in AI-driven robotics.

Source: Article, MIT News

Image source: Unsplash

Want to publish your content?

Publish an article and share your insights to the world.

Get Published Icon
ALSO EXPLORE