Get featured on INDIAai

Contribute your expertise or opinions and become part of the ecosystem!

DeepMind, a subsidiary of Google-parent Alphabet, and the European Molecular Biology Laboratory (EMBL) has created the most exhaustive map of human proteins till date. The company plans to release this mammoth data for free to the world. This could have the potential to significantly advance our understanding of proteins, the building blocks of life. Some scientists are comparing the potential impact of this work to that of the Human Genome Project, an international effort to map every human gene.

Proteins are responsible for numerous tasks in the body - from building body tissue to fighting off diseases. Each protein has a different structure, according to its 'job'. These structures, which look like origami shapes, are called protein folds. Understanding how a protein folds helps explain its function, which in turn helps scientists with a range of tasks — from pursuing fundamental research on how the body works, to designing new medicines and treatments.

Previously studying protein folds was an expensive and time-consuming process. However, DeepMind released an Artificial Intelligence (AI) software 'AlphaFold' last year, which successfully demonstrated that it can produce accurate predictions of a protein’s structure. Now, the company is releasing hundreds of thousands of predictions made by the program to the public.

“I see this as the culmination of the entire 10-year-plus lifetime of DeepMind,” company CEO and co-founder Demis Hassabis told The Verge. “From the beginning, this is what we set out to do: to make breakthroughs in AI, test that on games like Go and Atari, [and] apply that to real-world problems, to see if we can accelerate scientific breakthroughs and use those to benefit humanity.”

In the public forum, 180,000 protein structures are available, each produced by experimental methods and accessible through the Protein Data Bank. AlphaFold will release predictions for the 350,000 proteins across 20 different organisms, including animals like mice and fruit flies, and bacteria like E. coli. The repository will contain predictions for 98 percent of all human proteins, around 20,000 different structures, which are collectively known as the human proteome. It isn’t the first public dataset of human proteins, but it is the most comprehensive and accurate.

If they want, scientists can download the entire human proteome for themselves, says AlphaFold’s technical lead John Jumper. “There is a HumanProteome.zip effectively, I think it’s about 50 gigabytes in size,” says Jumper. “You can put it on a flash drive if you want, though it wouldn’t do you much good without a computer for analysis!”

The data will be free in perpetuity for both scientific and commercial researchers, says Hassabis. “Anyone can use it for anything,” the DeepMind CEO noted at a press briefing. “They just need to credit the people involved in the citation.”

DeepMind plans to increase the database which will be maintained by Europe’s flagship life sciences lab, the European Molecular Biology Laboratory (EMBL). By the end of the year, DeepMind hopes to release predictions for 100 million protein structures, a dataset that will be “transformative for our understanding of how life works,” according to Edith Heard, director general of the EMBL.

Want to publish your content?

Publish an article and share your insights to the world.

Get Published Icon
ALSO EXPLORE