These are the year's most intriguing artificial intelligence research papers. It combines advances in AI with those in data science. It's arranged chronologically and includes a resource for reading the full piece.

The Pitfalls of Simplicity Bias in Neural Networks

Several papers have proposed Simplicity Bias (SB) to explain why neural networks generalize effectively. SB is the inclination of typical training processes such as Stochastic Gradient Descent (SGD) to find simple models. However, the precise definition of simplicity is hazy. Furthermore, earlier setups that employ SB to conceptually rationalize why neural networks generalize well do not simultaneously capture neural network non-robustness---a often seen occurrence in actuality. 

The researchers attempt to reconcile SB and the superior standard generalization of neural networks with the non-robustness observed in practice by creating datasets that 

(a) include a precise notion of simplicity, 

(b) include multiple predictive features with varying levels of simplicity, and 

(c) capture the non-robustness of neural networks trained on real data. 

The researchers make four observations based on theory and empirical study on these datasets: 

(i) SGD and variant SB can be extreme: neural networks can rely solely on the simplest feature and remain insensitive to all more complicated features. 

(ii) The extreme nature of SB could explain why seemingly insignificant distribution shifts and modest adversarial perturbations affect model performance dramatically. 

(iii) Despite popular belief, SB can impede generalization on the same data distribution, as SB persists even when the simplest feature has less predictive power than the more complicated features. 

(iv) Common methods for improving generalization and robustness, like ensembles and adversarial training, may fail to mitigate SB and associated problems. 

Given the importance of SB in neural network training, the researchers expect that the proposed datasets and methodologies will serve as an excellent testbed for evaluating novel algorithmic approaches aimed at avoiding SB's flaws.

Statistical Optimal Transport posed as Learning Kernel Embedding

To reliably predict the optimal transport plan/map from samples drawn from the given source and target marginal distributions is the goal of statistical Optimal Transport (OT). This work is unusual in that it frames statistical OT as the problem of estimating the kernel mean embedding of a transportation plan using sample data. 

The suggested estimator prevents overfitting via a form of regularization known as maximum mean discrepancy, which is complimentary to the more often used -divergence (entropy) based regularization. One important finding is that, in relatively benign scenarios, a dimensionality-free sample complexity may be used to recover both the transport plan and the transport map based on the Barycentric projection. 

In addition, out-of-sample estimation can be performed thanks to the smoothing that is implicit in the kernel mean embeddings. By proving a suitable representer theorem, a kernelized convex formulation for the estimator is obtained, which may be useful for performing OT in non-standard domains. The proposed method has been shown to be effective in experiments.

Self-Supervised Few-Shot Learning on Point Clouds

Massive point clouds are gaining popularity in both the business and academic worlds due to their versatility and increasing accessibility in fields including robotics, form synthesis, and autonomous vehicles. On supervised learning tasks like classification and segmentation, deep neural networks trained on labeled point clouds have recently demonstrated encouraging results. However, in supervised learning, the point clouds must be annotated, which is a time-consuming and laborious process. 

To address this issue, the authors offer two unique self-supervised pre-training tasks, where point cloud subsets exist within balls of various radii at each level of the cover-tree, to encode a hierarchical partitioning of the point clouds. Additionally, in a few-shot learning (FSL) context, their self-supervised learning network can only pre-train on the support set (comprising of limited training examples) needed to train the downstream network. 

Finally, the point embeddings from the fully trained self-supervised network are fed into the network performing the downstream task. Researchers provide an extensive empirical evaluation of their approach, demonstrating that supervised methods pre-trained with their self-supervised learning approach considerably increase the accuracy of state-of-the-art approaches when applied to downstream classification and segmentation tasks. In downstream classification tasks, their approach is similarly superior to earlier unsupervised approaches.

Sources of Article

Image source: Unsplash

Want to publish your content?

Publish an article and share your insights to the world.

Get Published Icon
ALSO EXPLORE