Today, the whole world is desperately seeking a magic wand for its fight against the novel coronavirus that has killed close to 200,000 people. Researchers across the globe scramble in search for a cure or a vaccine that somehow can halt, or at least slow down this "samhara tandava" (the dance of annihilation) of the virus that's scientifically known as SARS-CoV-2.

According to Sun Tzu, the great Chinese general, military strategist and the author of the 2500-year-old treatise titled 'The Art of War', "if you know the enemy and know yourself, you need not fear the result of a hundred battles." But, "if you know yourself but not the enemy, for every victory gained, you will also suffer a defeat."

Today, it is the unknownness of the virus that is leading us to one defeat after another in humanity's war against COVID-19. The "invisibility" of the virus, unpredictability in its behaviour, and the structural mystery have become obstacles for the scientific community to gain any significant victory. 

The structure of a virus 

A virus, when existing outside a cell, consists mainly of two parts. First, a genetic material that encodes the structure of the proteins- which determines how the virus acts. Second, a protein coat called the capsid, which surrounds and protects the genetic material. Proteins are chains of amino acids polymerised into long chains with different combinations. According to scientists, there can be over billion types of proteins with each of them performs a different task, and same is the case with the proteins in viruses. Coming to coronavirus, one of its key features is the spike-like protein nodes, which physically latch on to the proteins receptor of our lung cells. 

A protein achieves its ability to function based on the encoded information from its physical three-dimensional fold. Understanding the fold of a protein helps us understand its behaviour, which is crucial for developing any drugs to cure COVID-19. As pointed out by neuroscientist Shelly Xuelai Fan, "if a drug is going to fit into a protein like a key into a lock to trigger a whole cascade of nasty reactions, then the first step is to figure out the structure of the lock."

In other words, "trying to understand the virus without a 3D molecular structure is like trying to build a house without the blueprint design", writes Laura Foster of Tech UK, Britain's leading technology membership organisation. However, "the difficulty is that the 3D molecular structure of a protein can be complicated to predict because of the near-unlimited shapes this can take," Foster noted 

One effective way to solve this puzzle is by running computer simulations, or else, as Levinthal's paradox suggest, testing all of the structural possibilities of a protein molecule would take longer than the current age of the universe. Currently, advanced algorithms and supercomputers have been running simulations to find protein structure of SARS-CoV-2 virus, such as the famous Folding@Home initiative from Stanford University which allows us to donate the 'idle' GPUs of our computers to run these simulations. 

However, a much faster and efficient solution for this challenges could be using AI, as two strong initiatives from Google-owned DeepMind and "Chinese Google" Baidu has made remarkable strides towards deciphering the structure and nature of the COVID virus. 

AlphaFold: Predicting the viral protein fold

In the case of DeepMind, they repurposed the AlphaFold system for predicting the protein structure of the coronavirus. DeepMind, founded by former Chess and Go prodigy, and Cambridge trained neuroscientist Demis Hassabis, has often astonished the world with its giant leaps in deep learning, notably the now landmark victory of Go-playing agent AlphaGo over then the world champion Lee Sedol in 2016. 

AlphaFold, a deep learning system, focuses on predicting protein structure accurately when no structures of similar proteins are available. This process is called "free modelling", and it was accomplished through training neural networks to predict the shape of a protein from its genetic sequence. 

To understand how AlphaFold predicts the protein stricture of coronavirus, one must understand how AlphaFold predicts protein structure in general, a feat that made them champions at a biennial global competition called 13th Critical Assessment of Protein Structure Prediction (CASP13). 

AI has been used to determine a protein's folding structure before, but AlphaFold's 3D model predictions have often achieved unprecedented accuracy. For this AlphaFold combined two methods. In the first method, a generative neural network was trained on a technique that repeatedly replaced pieces of a protein structure from a database with new protein fragments it invented. This was based on the neural network predicting the properties of the protein from its genetic sequence such as the distances between pairs of amino acids, as well as the angles between chemical bonds that connect those amino acids. It was also trained to predict a distribution of distances between every pair of residues in a protein. "These probabilities were then combined into a score that estimates how accurate a proposed protein structure is," as noted by AlphaFold Team in a blog post.

In the second method, a different neural network optimised these scores through gradient descent, a technique used in machine learning for making small, incremental improvements, resulting in highly accurate structures. "This technique was applied to entire protein chains rather than to pieces that must be folded separately before being assembled into a larger structure, to simplify the prediction process," writes the researchers.

For predicting the structure of spike protein of coronavirus, DeepMind deployed an updated version of AlphaFold, which includes per-residue confidence scores to help indicate which parts of the structure are more likely to be correct. According to the researchers, the Alpha fold "provided an accurate prediction for the experimentally determined SARS-CoV-2 spike protein structure shared in the Protein Data Bank." These results were verified by the Francis Crick Institute in the UK and later released to the general scientific community. 

Steve Gamblin, the Director of Scientific Platforms at Francis Crick Institute in the UK, points out that "making these protein structures freely available will provide an important resource to the global research community in understanding the disease process and developing approaches to combat it."

On the other hand, the team member states that "these structure predictions have not been experimentally verified, but hope they may contribute to the scientific community's interrogation of how the virus functions, and serve as a hypothesis generation platform for future experimental work in developing therapeutics." 

LinearFold: Preventing the creation of viral proteins

The second major effort to decrypt the coronavirus comes from the same land where the outbreak originated -China. Baidu, one of the top tech giants in China, uses a machine-learned algorithm to predict the structure of another biomolecule of the virus called the messenger RNA(mRNA). mRNA serves the function of transporting information from the genome to the protein factories and cutting them out "at the root" means no viral proteins are created in the first place. 

To accomplish this, Baidu developed the LinearFold algorithm, which is now made available to the scientific and medical teams fighting the outbreak. According to Baidu, "the algorithm is significantly faster than traditional RNA folding algorithms at predicting a virus's secondary RNA structure, and "analysing the secondary structural changes between homologous RNA virus sequences (such as bats and humans) can provide scientists with further insight into how viruses spread across species."

LinearFold was first introduced in a 2019 paper titled 'LinearFold: linear-time approximate RNA folding by 5'-to-3’ dynamic programming and beam search' published on Bioinformatics. The article states that testing on a diverse dataset of RNA sequences with well-established structures, LinearFold demonstrated substantially more efficient and its results showed higher average accuracies. In addition, LinearFold is more accurate on long-range base pairs, which is well known to be a challenging problem for the current models.

Presently, there are numerous studies which state that the secondary structure of mRNA is correlated to its functional half-life and has an impact on protein translation. Access to quick viral structural analysis can significantly shorten the time it takes to design a potential mRNA vaccine with higher stability and better effectiveness, providing an opportunity to save thousands of lives. Along with opening the LinearFold algorithm to the broader community, Baidu collaborates with health and academic institutions to share computation resources, provide customised support, and help optimise mRNA vaccine design.

"We hope that this powerful ability can be quickly leveraged by our researchers and anti-epidemic experts, and work with society as a whole to help improve the speed of virus research and vaccine development," says Haifeng Wang, CTO at Baidu.

It has to be seen in coming months how capable these techniques are and how efficiently the scientific community can leverage it in the fight against coronavirus. Coming back to protein, it has to be noted that word had its origins in the Greek word "proteios" which translates into something of primary importance. As with the case of all the life-sustaining functions, in the battle to conquer COVID-19 too protein of the virus has become the factor of at most significance. 

Sources of Article

Images from DeepMind and National Institute of Allergy and Infectious Diseases (USA)

Want to publish your content?

Publish an article and share your insights to the world.

ALSO EXPLORE

DISCLAIMER

The information provided on this page has been procured through secondary sources. In case you would like to suggest any update, please write to us at support.ai@mail.nasscom.in