Continual learning, deep neural networks, and multi-modal extreme classification

Pillars
IndiaAI Portal
Resources
Ecosystem
Sectors

Back

Results for ""

IndiaAI Recommends

These are the year's most interesting AI research articles. It combines breakthroughs in artificial intelligence (AI) with data science. It is chronologically organized and includes a link to a longer article.

GCR: Gradient coreset-based replay buffer selection for continual learning

Continual learning (CL) seeks to develop approaches by which a single model may adapt to an increasing number of consecutively encountered tasks, potentially leveraging learning across tasks in a resource-efficient manner. However, a significant obstacle for CL systems is catastrophic forgetting, which occurs when previous tasks are forgotten while learning a new one.

Gradient Coreset Replay (GCR) is a unique replay buffer selection and update approach that employs a carefully crafted optimization criterion. Specifically, they pick and maintain a "coreset" that closely approximates the gradient of all the data observed to date to the current model parameters. They explore the necessary strategies for its effective implementation in a continuous learning environment.

In a well-studied offline continuous learning context, the researchers demonstrate significant increases (2%-4% absolute) over the state-of-the-art. Their findings also efficiently transfer to online/streaming CL environments, showing increases of up to 5% over current methods. The researchers conclude by demonstrating the benefit of supervised contrastive loss for continuous learning, which gives a cumulative improvement in accuracy of up to 5 per cent when combined with their subset selection technique.

Merry Go Round: Rotate a Frame and Fool a DNN

The majority of first-person videos captured today are taken with wearable cameras. Like other computer vision challenges, most state-of-the-art (SOTA) egocentric vision systems rely on Deep Neural Networks (DNNs). On the other hand, it is well-known that DNNs are vulnerable to Adversarial Attacks (AAs) that add invisible noise to the input. In addition, Black-box and white-box assaults on image and video analysis jobs have been demonstrated.

The researchers note that most AA approaches alter an image's intensity. Even for videos, the same process is repeated for each frame separately. They emphasise that the concept of imperceptibility used for images may not apply to videos, where a random shift in intensity between two consecutive frames may nevertheless be discernible.

The authors propose using optical flow perturbation to perform AAs on a video analysis system as a key innovation in this study. Such disruption is particularly good for egocentric videos, as there is already a great deal of tremor in egocentric recordings, and adding a little more makes it highly undetectable. Our proposal can generally be interpreted as adding structured, parametric noise as the adversarial disturbance. Their implementation of the concept by adding 3D rotations to the frames demonstrates that with their technique, one can mount a black-box AA on an egocentric activity detection system with one-third fewer queries than the SOTA AA technique.

Multi-modal Extreme Classification

This study introduces the MUFIN approach for extreme classification (XC) challenges involving millions of labels and data points provided with visual and textual descriptors. Presented are applications of MUFIN for product-to-product recommendation and bid query prediction across millions of items. Modern multimodal techniques usually rely solely on embedding-based techniques. On the other hand, XC approaches employ classifier designs to provide more accurate results than embedding-only methods while focusing primarily on text-based classification problems.

MUFIN creates an architecture based on cross-modal attention and trains it modularly, utilising pre-training, positive and negative mining, and pre-training. A novel dataset for product-to-product recommendations, MM-AmazonTitles-300K, encompassing approximately 300K products with titles and various photos, was gathered from publically available Amazon.com listings. On the MM-AmazonTitles-300K and Polyvore datasets and a dataset with over 4 million labels collected from Bing search engine click logs, MUFIN provided at least 3% greater accuracy than leading text-based, image-based, and multimodal approaches.

Want to publish your content?

Publish an article and share your insights to the world.

IndiaAI Recommends

Research Roundup - Continual learning, deep neural networks, and multi-modal extreme classification

Want to publish your content?

ALSO EXPLORE