Lip-to-Speech synthesis, adaptive mixing, and clustering with outliers

Pillars
IndiaAI Portal
Resources
Ecosystem
Sectors

Back

Results for ""

IndiaAI Recommends

These are the most exciting publications in AI research from the past year. It integrates developments in artificial intelligence (AI) and data science. It is organised chronologically and contains a link to a lengthier article.

Lip-to-Speech Synthesis for Arbitrary Speakers in the Wild

In this work, the researchers address the difficulty of creating speech from silent lip films for any speaker in the wild. The task involves several challenges, the most significant of which is that many aspects of the desired target speech, such as voice, pitch, and linguistic content, can only be deduced partially from the silent face footage. To deal with these stochastic changes, the researchers suggest a new VAE-GAN architecture that learns to correlate lip and voice sequences amid the variations. Their generator learns to synthesise speech sequences in any voice for any person's lip movements with the help of multiple powerful discriminators that guide the training process.

Extensive testing on multiple datasets demonstrates that they surpass all baselines by a wide margin. Furthermore, they can fine-tune their network on videos of specific identities to achieve performance comparable to single-speaker models trained on four times more data. Finally, the researchers undertake several ablation studies to investigate the impact of various modules in their architecture.

Adaptive mixing of auxiliary losses in supervised learning

In several supervised learning scenarios, auxiliary losses are used to contribute additional information or constraints to the objective of supervised learning. For example, in rule-based approaches, labelling functions may be noisy rule-based approximations of accurate labels and provide limited labelling information. The researchers address the issue of combining these losses in a principled fashion.

Their proposal, AMAL, employs a bi-level optimisation criterion on validation data to discover optimal blending weights on a per-instance basis over training data. The researchers describe a meta-learning strategy for solving this bi-level objective and demonstrate how it can be applied to various supervised learning scenarios. Experiments conducted in diverse knowledge distillation and rule-denoising domains show that AMAL outperforms competitive baselines in these areas. Finally, they perform an empirical analysis of their method and provide insight into the performance-enhancing mechanisms it employs.

Clustering What Matters: Optimal Approximation for Clustering with Outliers

One of the primary issues in computer science is clustering with outliers. Given an n-point set X and two integers k and m, clustering with outliers attempts to eliminate m points from X and partition the remaining points into k clusters that minimise a specific cost function. The researchers present a generic approach to clustering with outliers in this study, resulting in a fixed-parameter tractable (FPT) algorithm in k and m that almost equals the approximation ratio for its outlier-free version.

In general metrics, the researchers obtain FPT approximation methods with optimal approximation ratios for k-Median and k-Means with outliers. They also show how their approach may be applied to various forms of the problem that place additional limitations on clustering, such as fairness or matroid requirements.

Sources of Article

Image source: Unsplash

Want to publish your content?

Publish an article and share your insights to the world.