AI algorithms have proven their capabilities in protein detection, folding, and prediction. Small proteins play a significant role in regulating immune response, inflammation and neurodegenerative diseases. Proteins are some of the most challenging molecules in the universe, and they pose a big challenge for AI. Implementing ML and AI in protein science gives rise to a world of knowledge adventures in cell and proteome homeostasis, essential for making life possible.   

MIT researchers have taken a step further when they used AI to design new proteins that go beyond those found in nature. They developed an ML algorithm that can generate protein with specific features. This could be used to make materials that have certain mechanical properties like stiffness or elasticity.  

Earlier, DeepMind had repurposed the AlphaFold system for predicting the protein structure of the coronavirus. And Google’s invention of BERT (Bidirectional Encoder Representations from Transformers) was a milestone in understanding computer’s ability to grasp human languages.

The most significant benefit of these proteins is that they could potentially replace materials made from petroleum or ceramics but with a much smaller carbon footprint. The researchers from MIT, MIT-IBM Watson AI Lab, and Tufts University employed a generative model, the same type of ML model architecture used in AI systems like DALL-E 2.  

DALL-E 2 generates realistic images from natural language prompts. But the researchers adapted the model architecture to predict amino acid sequences of proteins that achieve specific structural objectives. The models learn how proteins form and can produce new proteins that could enable unique applications.  

This tool can be used to develop protein-inspired food coatings, which could keep produce fresh longer while being safe for humans to eat. In addition, the models can generate millions of proteins in a few days, which provides scientists with a portfolio of new ideas to explore.

Novel tools 

Proteins are formed by chains of amino acids folded together in 3D patterns. This sequence determines the mechanical properties of the proteins. While scientists identified thousands of proteins created through evolution, they estimated that an enormous number of amino acid sequences remain undiscovered. 

To streamline protein discovery, researchers have recently developed deep learning models that can predict the 3D structure of a protein for a set of amino acid sequences. But the inverse problem of predicting a sequence of amino acid structures that meet design targets has proven more challenging.  

According to Buehler, a member of the MIT-IBM Watson AI Lab, attention-based models can learn very long-range relationships, which is key to developing proteins because one mutation in an extended amino acid sequence can make or break the entire design. For example, a diffusion model learns to generate new data and then learns to recover the data by removing the noise. This method produces high-quality, realistic data more effectively than other models.  

The MIT researchers developed two models. One model operates on the overall structural properties of the protein, and the other model operates at the amino acid level. Both models work by combining these amino acid structures to generate proteins. The models are connected to an algorithm that predicts protein folding, which the researchers use to determine the protein’s 3D structure.

Realistic designs 

The researchers tested their models by comparing the proteins to known proteins with similar structural properties. About 50 to 60 per cent of the proteins had entirely new sequences. According to Buehler, the level of similarity suggests that many of the generated proteins are synthesizable. 

Even after tricking the models by inputting physically impossible design targets, instead of producing improbable proteins, the models generated the closest synthesizable solution. A researcher stated that the learning algorithm could pick up the hidden relationships in nature. The team believes that whatever comes out of the model is very likely to be realistic. 

The team plans to experimentally validate some of the new protein designs by making them in a lab. They also want to continue augmenting and refining the models so they can develop amino acid sequences that meet more criteria. 

Want to publish your content?

Publish an article and share your insights to the world.

ALSO EXPLORE

DISCLAIMER

The information provided on this page has been procured through secondary sources. In case you would like to suggest any update, please write to us at support.ai@mail.nasscom.in