Results for ""
These are the year's most intriguing AI research publications. It integrates innovations in AI and data science. It is chronologically organized and includes a link to a lengthier article.
The researchers provide a systematic algorithm for solving constrained classification issues. These challenges include optimizing a specific target, such as F-measure or G-mean, while adhering to certain limitations, such as demographic parity fairness or coverage. The restrictions are expressed as general functions of the confusion matrix. Their methodology simplifies the challenge by breaking it down into a series of classifier learning assignments that can be easily integrated. The reduction is accomplished by formulating the learning problem as an optimum over the intersection of two sets:
By separating the constraint space, we can solve the problem using Frank-Wolfe style optimization on the individual sets. The aim and constraints are convex functions of the confusion matrix. The researchers provide empirical evidence demonstrating that their algorithm is comparable to previous techniques while also exhibiting greater resilience to variations in hyper-parameter selections.
The researchers propose a method for lifelong/continual learning of convolutional neural networks (CNN) without experiencing the issue of catastrophic forgetting when transitioning between tasks. The researchers demonstrate that the CNN trained on the previous task may be adjusted using a small number of calibration parameters to make the activation maps applicable to the new challenge.
Using spatial and channel-wise calibration modules, the researchers adjust the activation maps generated by each network layer. They then exclusively train these calibration parameters for each new task, enabling lifelong learning. Their calibration modules exhibit a notable reduction in computation and parameters compared to the methods that dynamically grow the network. Their methodology resists catastrophic forgetting as the researchers retain the task-adaptive calibration parameters, which encompass all the task-specific knowledge and are unique to each task.
In addition, their methodology eliminates the need to save data samples from previous tasks, a need common in many replay-based systems. The researchers conducted comprehensive trials on benchmark datasets, including SVHN, CIFAR, ImageNet, and MS-Celeb. The results consistently show significant enhancements compared to the most advanced techniques. For instance, there was a remarkable 29% absolute increase in accuracy on CIFAR-100 with ten classes simultaneously. When applied to large-scale datasets, their method significantly improves accuracy. Specifically, it results in a 23.8% increase in accuracy on the ImageNet-100 dataset and a 9.7% increase on the MS-Celeb-10K dataset. This improvement is achieved using a small number of task-adaptive calibration parameters, accounting for only 0.51% and 0.35% of the model parameters, respectively.
The researchers suggest a framework for several structured prediction tasks in Sanskrit that uses energy-based models. Similar to graph-based parsing techniques, their framework is an arc-factored model. It considers the tasks of dependency parsing, word segmentation, morphological parsing, syntactic linearization, and prosodification—a "prosody-level" activity that the researchers present in this work.
Their system is a search-based structured prediction system that takes as input a graph with nodes that represent relevant linguistic information contained in it and edges that represent the associations between these nodes. The performance of the most advanced models for morphosyntactic tasks in morphologically rich languages is typically still dependent on hand-crafted features.
However, in this case, the researchers automate the feature function learning. In conjunction with the created search space, the learnt feature function encodes pertinent linguistic data for the tasks under consideration. As a result, compared to the data requirements for the neural state-of-the-art models, the researchers can significantly reduce the training data requirements to as low as 10%. Their trials in Sanskrit and Czech demonstrate the framework's linguistic agnosticism since the researchers develop fiercely competitive models for each language.
Furthermore, the researchers can apply language-specific restrictions to the framework to filter the possibilities and reduce the search space during inference. By adding language-specific constraints to the model, the researchers achieved notable improvements in morphosyntactic tasks for Sanskrit. Their data-driven approach is the only one available for all the tasks the researchers discuss for Sanskrit, or they attain state-of-the-art outcomes.
Image source: Unsplash