These are the most intriguing AI research publications from the past year. It integrates artificial intelligence (AI) and data science developments. It is chronologically organised and contains a link to a longer article.

ImageBind: Holistic AI learning across six modalities

The researchers made ImageBind, the first AI model that can combine knowledge from six different sources, and are making it available to everyone. The model learns a single embedding or shared representation space, not just for text, image/video, and audio but also for devices that record depth (3D), thermal (infrared radiation), and inertial measurement units (IMU), which figure out motion and position. So imageBind gives machines a whole idea of how things in a picture will move, how they will sound, how they will look in 3D, how hot or cold, and how they will feel.

Their paper explains that ImageBind can do better than previous specialist models taught separately for each modality. But what's most important is that it helps AI progress by making it easier for machines to analyse different kinds of knowledge together. So, for example, Meta's Make-A-Scene may use ImageBind to create a jungle or market scene from noises. Other options for the future include more accurate ways to recognise, connect, and moderate content, as well as ways to improve creative design, such as making it easier to create richer media and making search functions that use more than one mode.

CrysGNN: Distilling pre-trained knowledge to enhance property prediction for crystalline materials

In recent years, graph neural network (GNN) based approaches have emerged as an effective method for encoding the complex topological structure of crystal materials within an enriched representation space. These models use property-specific training data to discover the relationship between crystal structure and various properties, such as formation energy, bandgap, and bulk modulus. Unfortunately, most of these methods require a substantial quantity of property-tagged data to train the system, which may not be accessible for all properties. Nonetheless, much information is available regarding the chemical composition and structural bonds of crystals. This paper introduces CrysGNN, a new pre-trained GNN framework for crystalline materials that captures both node and graph-level structural information of crystal graphs by leveraging a vast quantity of unlabeled material data. 

In addition, the researchers extract knowledge from CrysGNN and incorporate it into various state-of-the-art property predictors to improve their accuracy. They conduct extensive experiments to demonstrate that, with knowledge extracted from the pre-trained model, all SOTA algorithms outperform their conventional counterparts by significant margins. In addition, the researchers note that the distillation process provides a considerable improvement over the traditional method of pre-trained model fine-tuning. Finally, they have released the pre-trained model alongside a large dataset of 800K crystal graphs that they have meticulously curated so that the pre-trained model can be inserted into any existing or future models to improve their prediction accuracy.

Interactive concept bottleneck models

Concept bottleneck models (CBMs) are neural networks that humans can understand. They first predict labels for concepts that humans can understand that are important to the prediction task, and then they predict the end label based on the predictions for the concept labels. The researchers make CBMs work in interactive prediction settings where the model can ask a human collaborator for some ideas for the label. They make an interaction policy that decides, at prediction time, which concepts to ask for a label so that the end prediction is as good as it can be. 

The researchers show that a simple policy that considers the uncertainty of concept prediction and the effect of the concept on the end prediction does better than static approaches and active feature acquisition methods. The Caltech-UCSD Birds, CheXpert, and OAI datasets show that the interactive CBM can improve accuracy by 5–10% over competitive baselines with only five contacts.

Sources of Article

Image source: Unsplash

Want to publish your content?

Publish an article and share your insights to the world.

Get Published Icon
ALSO EXPLORE