A new artificial intelligence model developed by researchers at the University of Texas at Austin will lead to more effective and less toxic treatments and new preventive strategies in medicine. The AI model informs the design of protein-based therapies and vaccines by leveraging the underlying logic from nature's evolutionary processes.

The AI advance, called EvoRank, offers a new and tangible example of how AI may help bring disruptive change to biomedical research and biotechnology more broadly. Scientists described the work at the International Conference on Learning Representations (ICLR 2024) and published a related paper in Nature Communications about leveraging a broader AI framework to identify useful mutations in proteins.

A major obstacle to designing better protein-based biotechnologies is having enough experimental data about proteins to adequately train AI models to understand how specific proteins work and, thus, how to engineer them for specific purposes.

The key insight with EvoRank is to harness the natural variations of millions of proteins generated by evolution over deep time and extract the underlying dynamics needed for workable solutions to biotech challenges.

"Nature has been evolving proteins for 3 billion years, mutating or swapping out amino acids and keeping those that benefit living things," said Daniel Diaz, a research scientist in computer science and co-lead of the Deep Proteins group, an interdisciplinary team of computer science and chemistry experts at UT.

AI in Protein Engineering

"EvoRank learns how to rank the evolution that we observe around us, to essentially distil the principles that determine protein evolution and to use those principles so they can guide the development of new protein-based applications, including for drug development and vaccines, as well as a wide range of biomanufacturing purposes", said Daniel Diaz.

UT is home to one of the country's leading programs for AI research and houses the Institute for Foundations of Machine Learning (IFML), which is led by computer science professor Adam Klivans, who also co-leads Deep Proteins.

A project involving Deep Proteins and vaccine-maker Jason McLellan, a UT professor of molecular biosciences, in collaboration with the La Jolla Institute for Immunology, will apply AI in protein engineering research into developing vaccines to fight herpes viruses.

"Engineering proteins with capabilities that natural proteins do not have is a recurring grand challenge in the life sciences," Klivans said. "It also happens to be the type of task that generative AI models are made for, as they can synthesize large databases of known biochemistry and then generate new designs."

Unlike Google DeepMind's AlphaFold, which applies AI to predict the shape and structure of proteins based on each one's sequence of amino acids, the Deep Proteins group's AI systems suggest how best to make alterations in proteins for specific functions, such as improving the ease with which a protein can be developed into new biotechnologies.

McLellan's lab is already synthesizing different versions of viral proteins based on AI-generated designs and then testing their stability and other properties.

Protein therapeutics

Protein therapeutics often have fewer side effects and can be safer and more effective than the alternatives. The estimated $400 billion global industry today is primed to grow more than 50% during the next decade. Still, developing a protein-based drug is slow, costly, and risky.

An estimated $1 billion or more is needed for the decade-plus journey from drug design to completing clinical trials; even then, the odds of securing approval from the Food and Drug Administration for a company's new drug are only about 1 in 10.

What's more, to be useful in therapeutics, proteins often need to be genetically engineered, for example, to ensure their stability or to allow them to yield at a level needed for drug development—and cumbersome trial-and-error in labs traditionally have dictated such genetic engineering decisions.

Using existing databases of naturally occurring protein sequences, the researchers who created EvoRank essentially lined up different versions of the same protein that appear in different organisms—from starfish to oak trees to humans—and compared them.

At any given position in the protein, there might be one of several different amino acids that evolution has found to be useful, with nature selecting, say, 36% of the time, the amino acid tyrosine, 29% of the time histidine, 14% of the time lysine—and even more importantly never leucine.

The team uses all of this to train the new machine learning algorithm. Based on continuous feedback, the model learns which amino acid nature opted for during the past when evolving proteins, and it bases its understanding on what's plausible in nature and what is not.

Sources of Article

Want to publish your content?

Get Published Icon
ALSO EXPLORE