In drug design, compound potency prediction is a popular machine-learning application. Graph neural networks (GNNs) predict ligand affinity from graph representations of protein–ligand interactions typically extracted from X-ray structures. Despite some promising findings leading to claims that GNNs can learn details of protein–ligand interactions, such predictions are also controversially viewed. For example, evidence has been presented that GNNs might not learn protein–ligand interactions but rather memorize ligand and protein training data.

A significant breakthrough has been made by Prof. Dr. Jürgen Bajorath and his team, cheminformatics experts at the University of Bonn. They have devised a technique that uncovers the operational mechanisms of specific AI systems used in pharmaceutical research. They have conducted affinity predictions with six GNN architectures on community-standard datasets and rationalized the predictions using explainable artificial intelligence. The results confirm a strong influence of ligand—but not protein—memorization during GNN learning and also show that some GNN architectures increasingly prioritize interaction information for predicting high affinities. Thus, while GNNs do not comprehensively account for protein–ligand interactions and physical reality, depending on the model, they balance ligand memorization with learning interaction patterns.

Drug discovery

Which drug molecule is most effective? Researchers are feverishly searching for efficient active substances to combat diseases. These compounds often dock onto proteins, usually enzymes or receptors, that trigger a specific chain of physiological actions.

In some cases, certain molecules are also intended to block undesirable reactions in the body – such as an excessive inflammatory response. Given the abundance of available chemical compounds, this research is like searching for a needle in a haystack at first glance. Drug discovery, therefore, attempts to use scientific models to predict which molecules will best dock to the respective target protein and bind strongly. These potential drug candidates are then investigated in more detail in experimental studies.

According to Prof. Dr. Jürgen Bajorat, how GNNs arrive at their predictions is like a black box we can't glimpse into.

AI Applications

The researchers analyzed six different GNN architectures using their specially developed "EdgeSHAPer" method and a conceptually different methodology for comparison. These computer programs "screen" whether the GNNs learn the most critical interactions between a compound and a protein and thereby predict the potency of the ligand, as intended and anticipated by researchers – or whether AI arrives at the predictions in other ways.

The study's first author, PhD candidate Andrea Mastropietro from Sapienza University in Rome, who conducted a part of his doctoral research in Prof. Bajorath's group in Bonn, stated that the GNNs depend on the data they are trained with.

According to the professor, if the GNNs do what they are expected to, they need to learn the interactions between the compound and target protein, and the predictions should be determined by prioritizing specific interactions.

"The development of methods for explaining predictions of complex models is an important area of AI research. There are also approaches for other network architectures, such as language models, that help better understand how machine learning arrives at its results," says Bajorath. He expects that exciting things will soon also happen in the "Explainable AI" field at the Lamarr Institute, where he is a PI and Chair of AI in the Life Sciences.

Sources:

Nature

Science Daily

Want to publish your content?

Publish an article and share your insights to the world.

Get Published Icon
ALSO EXPLORE