Results for ""
These are the year's most intriguing AI research publications. It combines breakthroughs in artificial intelligence (AI) with data science. It is chronologically arranged and includes a link to a longer article.
There have been many different applications for multimodal natural language processing, but only some studies have focused on multimodal relational lexical semantics. The researchers' initial attempt to use visual cues to identify lexico-semantic relationships, which represent language phenomena like synonymy, co-hyponymy, and hypernym, is proposed in this study.
The researchers suggest that visual information can supplement the textual information of the semiotic textology linguistic theory. At the same time, conventional approaches make use of the paradigmatic system. To do this, the researchers automatically add visual information to two gold-standard datasets and create several fusion algorithms to mix textual and visual modalities using the patch-based approach. Experimental findings using multimodal datasets show that visual information can reliably boost performance by filling semantic gaps in textual encodings.
Multilingual, Multimodal and Multi-Aspect dataset of Social Media Short Videos
The researchers provide 3MASSIV, a multilingual, multimodal, and multi-aspect collection of various short videos pulled from the Moj short-video social media network. The 50k short videos (with an average duration of 20 seconds) and 100k unlabeled videos in 11 different languages that make up 3MASSIV capture the most common short video trends, such as pranks, fails, romance, and comedy, in original audio-visual formats like self-shot videos, reaction videos, lip-synching, and self-sung songs, among others. Moreover, by annotating these unique videos for concepts, affective states, media types, and audio language, 3MASSIV offers a chance for multimodal and multilingual semantic understanding.
The researchers provide a comprehensive study of 3MASSIV and highlight our dataset's variety and distinctive features compared to other popular datasets from the modern era with solid baselines. Additionally, they demonstrate how the social media information in 3MASSIV is dynamic and temporal, making it useful for cross-linguistic analysis and semantic comprehension tasks.
A Framework for Learning Ante-hoc Explainable Models via Concepts
Self-explaining deep models are made to implicitly learn the latent concept-based explanations while being trained. It means that no post-hoc explanation generation techniques are needed. In this work, the researchers propose a model that adds an explanation generation module to any basic network and trains the whole thing together. This model has a high predictive performance and makes explanations that make sense in terms of concepts. Moreover, their training method works well for unsupervised learning of concepts and needs much less parameter space than baseline methods.
Their proposed model also includes a way to use self-supervision to get better explanations of concepts. But with full concept supervision, the researchers get the best prediction results compared to concept-based explainable models that have been proposed recently. Researchers report qualitative and quantitative results from our method, which works better than concept-based explainability methods proposed recently. The researchers gave detailed results for two datasets without ground truth concepts (CIFAR10 and ImageNet) and two with ground truth concepts (AwA2 and CUB-200) to show that their method worked in both situations.