The evolution of viruses sometimes prevents scientists from producing effective vaccines against diseases such as influenza, HIV. Since these viruses mutate rapidly, they continue to confuse and elude the antibodies generated by vaccines; this process is called 'viral escape'.

To understand and predict these 'escapes', MIT researchers have devised a computational model based on Natural Language Processing (NLP) models that were originally created to decrypt and predict languages. The model can predict the sections of viral surface which have the most likely to mutate, thus enabling a 'viral escape'. Luckily, it also predicts the sections that are less likely to mutate which can be helpful to create new vaccines. 

Different viruses mutate at different rates; for eg, HIV and Influenza mutate the fastest. Their mutation, for viral escape, happens in a way that helps the virus change the shape of its surface proteins to that they can't be bound by the antibodies of the vaccine. But, the change can't be so drastic that the virus becomes non-functional. 

“Viral escape is a big problem,” says Bonnie Berger, the Simons Professor of Mathematics and head of the Computation and Biology group in MIT’s Computer Science and Artificial Intelligence Laboratory. “Viral escape of the surface protein of influenza and the envelope surface protein of HIV is both highly responsible for the fact that we don’t have a universal flu vaccine, nor do we have a vaccine for HIV, both of which cause hundreds of thousands of deaths a year.”

The MIT team decided to model these criteria on an NLP model. The NLP is used to analyse speech and language - which word can occur in a sequence, etc. For example, the model can predict the upcoming words that are grammatically correct and have a logical meaning in a sentence such as, “Sally ate eggs for …”. For this example, a model can predict “breakfast,” or “lunch.”

To apply a language model to understanding biological information such as genetic sequences, the researchers drew a few parallels between the two subjects - to be able to maintain the sequence which changing the protein structures. Grammar became analogous to rules that decide whether a protein encoded by a sequence functional or not; the semantics became analogous with understanding whether a protein can change shapes to evade antibodies. 

“If a virus wants to escape the human immune system, it doesn’t want to mutate itself so that it dies or can’t replicate,” Hie says. “It wants to preserve fitness but disguise itself enough so that it’s undetectable by the human immune system.”

The NLP model was trained to analyse the protein patterns found in genetic sequences - which is easier to obtain than information on protein structures - to predict the new sequences that can have newer functions but would still follow the rules of protein structures. The highlight of this model is the ease of training - it requires a little amount of information. For this study, the researchers only used 60,000 HIV sequences, 45,000 influenza sequences, and 4,000 coronavirus sequences.

“Language models are very powerful because they can learn this complex distributional structure and gain some insight into function just from sequence variation,” Hie says. “We have this big corpus of viral sequence data for each amino acid position, and the model learns these properties of amino acid co-occurrence and co-variation across the training data.”

The study identified possible targets for vaccines towards influenza, HIV, and SARS-CoV-2. The researchers have applied their model to study the new variants of the SARS-CoV-2 that have recently surfaced in the United Kingdom and South Africa. The analysis has flagged several viral genetic sequences that need to be further investigated to understand their potential to escape the anti-bodies. However, this analysis is yet to be peer-reviewed. 

The model's analysis of the SARS-CoV-2 virus suggests that a part of the spike protein called the S2 subunit is the least likely protein to generate virus escapes, therefore, can be targetted for effective vaccines. However, the researchers yet do not know the speed of mutation of this virus, thus it is yet unpredictable for how long will vaccinations combat COVID-19 pandemic. While the virus doesn't evolve as fast as HIV and influenza, new strains of the virus have already appeared in Singapore, South Africa, and Malaysia. 

While studying HIV, the researchers discovered that the proteins in V1-V2 hypervariable region have many possible escape mutations; a discovery that is consistent with previous findings. In addition, they have also found sequences that would have a lower probability of escape.

“There are so many opportunities, and the beautiful thing is all we need is sequence data, which is easy to produce,” Bryson says.

The research was funded by a National Defense Science and Engineering Graduate Fellowship from the Department of Defense and a National Science Foundation Graduate Research Fellowship. The study was recently published in Science. Berger and her colleague, Brian Hie and Bryan Bryson, an assistant professor of biological engineering at MIT and a member of the Ragon Institute of MGH, MIT, and Harvard are the senior authors of the study while MIT graduate student Brian Hie is the lead author, who was supported by Ellen Zhong, a PhD student at MIT's Computer Science and Artificial Intelligence (AI) Lab. 

Want to publish your content?

Publish an article and share your insights to the world.

Get Published Icon
ALSO EXPLORE