Deep inside our body, deeper inside our cells, there are billions of tiny engines called proteins that carry out functions that are essential to the existence of life itself. These proteins are nothing but a chain of twenty amino acids molecules linked together by peptide bonds. Even though proteins are not much effective as a strand, they get their abilities when they fold into three-dimensional structures as a result of the interactions between the amino acids. Interestingly, this folded structure defines the exact function they will carry out. 

Determining or predicting, this three dimensional folded structure is critical for the advancement biosciences. Through a better understanding of protein folding and unfolding errors, we will be able to treat diseases, produce much effective medicines and vaccines. Furthermore, a shortage of properly folded proteins in the body can lead to multiple diseases, while the clumps caused by the aggregation of improperly folded ones are believed to be the cause of cancers and Alzheimer's disease. 

However, there is one problem. How can we determine the structure each of the over 400,000 plus proteins will fold into? This is called the Protein Folding Problem. On November 30th, for the first time in the 50 years, an AI called AlphaFold by DeepMind has finally cracked it. 

It was at end of the 1960s, molecular biologists noticed that the fold of a protein is solely based on the interactions between these amino acids. This led Christian Anfinsen to famously point out, during his Nobel acceptance speech in 1972, that a protein’s amino acid sequence should fully determine its structure. But that was just the beginning of the problem. 

Cyrus Levinthal, the renowned molecular biologist, pointed out back then that it would take longer than the age of the known universe to enumerate all possible configurations of a typical protein by brute force calculation with an estimated 10^300 possible conformations for a typical protein. Because of the very large number of degrees of freedom in an unfolded polypeptide chain, the molecule has an astronomical number of possible conformations. Yet, in nature, proteins fold into its correct structure almost instantaneously. This is called Levinthal's Paradox.

The first complete fold of a protein was determined using a technique called X-ray crystallography, which has then produced most of the protein structures we know. Another method is the cryo-electron microscopy. Together they are the golden standard for determining protein folding, despite being very laborious and expensive. 

However, a lot of expectations arose when computational methods were introduced in the 80s. As a result, in 1994, Professor John Moult and Professor Krzysztof Fidelis founded CASP- a biennial blind assessment to catalyse research, monitor progress, and establish the state of the art in protein structure prediction, which is now the gold standard for assessing predictive techniques. CASP chooses protein structures that have only very recently been experimentally determined to be targeted for teams to test their structure prediction methods against. Participants must blindly predict the structure of the proteins, and these predictions are subsequently compared to the experimental data when they become available. 

The main metric used by CASP to measure the accuracy of predictions is the Global Distance Test (GDT) which ranges from 0-100, and a score of around 90 GDT is considered to be competitive with results obtained from experimental methods.

In 2018, after making some of the major landmarks in AI history, such as defeating champion Lee Sedol in the game of Go in 2016, Google-owned DeepMind turned their attention to solving this 50-year-old problem by entering the thirteenth edition of CASP, and they made significant strides with their AI agent AlphaFold. Earlier this year, the company even repurposed AlphaFold to provide structural predictions of the SARS-CoC-2 virus which has caused the COVID-19 global pandemic. 

However, it was the 14th CASP held this year, AlphaFold system was able to achieve its biggest accomplishment. As per the published assessment, AlphaFold system was able to achieve a median score of 92.4 GDT overall across all targets. This means its predictions have an average error (RMSD) of approximately 1.6 Angstroms, which is comparable to the width of an atom (or 0.1 of a nanometer).

“This is a big deal; in some sense, the problem is solved,” stated John Moult, a computational biologist at the University of Maryland in College Park, who co-founded CASP.  

In the first version, AlphaFold accomplished these result by training neural networks to predict the distances between pairs of amino acids as well as the angles between chemical bonds that connect those amino acids. AlphaFold then used a scoring function through gradient descent to search the protein landscape to find structures that matched its predictions. However, the details on their process for the CASP 14 accomplishment are yet to be published. 

"This computational work represents a stunning advance on the protein-folding problem, a 50-year-old grand challenge in biology. It has occurred decades before many people in the field would have predicted. It will be exciting to see the many ways in which it will fundamentally change biological research," said Nobel Laurete Professor Venki Ramakrishnan.

According to DeepMind, "the system was trained on publicly available data consisting of 170,000 protein structures from the protein data bank together with large databases containing protein sequences of unknown structure. It uses approximately 16 TPUv3s (which is 128 TPUv3 cores or roughly equivalent to ~100-200 GPUs) run over a few weeks, a relatively modest amount of compute in the context of most large state-of-the-art models used in machine learning today."

Professor Andrei Lupas, the Director of the Max Planck Institute for Developmental Biology and a CASP assessor, noted that “AlphaFold’s astonishingly accurate models have allowed us to solve a protein structure we were stuck on for close to a decade, relaunching our effort to understand how signals are transmitted across cell membranes.” "This is going to empower a new generation of molecular biologists to ask more advanced questions," she added.

In the CASP assessment, in some cases, AlphaFold’s structure predictions were indistinguishable from those determined using experimental methods such as X-ray crystallography as well as cryo-electron microscopy.

Demis Hassabis, DeepMind’s co-founder and chief executive, says that the company plans to make AlphaFold useful so other scientists can employ it. Tackling grand scientific challenges, such as protein-structure prediction, is one of the most important applications its AI can make, he added.

However, AlphaFold's accomplishments aren't without any drawbacks. It can take AlphaFold days to come up with a predicted structure. On the other hand, MIT professor Mohammed AlQuraishi’s system, which uses an algorithm called a recurrent geometrical network (RGN), can find protein structures a million times faster—returning results in seconds even though its predictions are less accurate. 

Nevertheless, AlphaFold's achievement has marked another major milestone in AI research and the real-world outcomes and benefits of this success will be ground breaking.

Want to publish your content?

Publish an article and share your insights to the world.

ALSO EXPLORE

DISCLAIMER

The information provided on this page has been procured through secondary sources. In case you would like to suggest any update, please write to us at support.ai@mail.nasscom.in