Google reveals novel neural audio codec

Pillars
IndiaAI Portal
Resources
Ecosystem
Sectors

Back

Results for ""

IndiaAI Recommends

Get featured on IndiaAI

Contribute your expertise or opinions and become part of the ecosystem!

Audio codecs are used to efficiently compress audio to reduce either storage requirements or network bandwidth. While the existing codecs, such as Opus and EVS, leverage expert knowledge of human perception as well as carefully engineered signal processing pipelines to maximise the efficiency of the compression algorithms, there has been recent interest in replacing these handcrafted pipelines by machine learning approaches that learn to encode audio in a data-driven manner.

In a recent paper titled “SoundStream: an End-to-End Neural Audio Codec”, Google researchers have introduce a novel neural audio codec that provides higher-quality audio and encodes different sound types, including clean speech, noisy and reverberant speech, music, and environmental sounds.

SoundStream is an important step towards improving machine learning-driven audio codecs. It outperforms state-of-the-art codecs, such as Opus and EVS, can enhance audio on demand, and requires deployment of only a single scalable model, rather than many. SoundStream is the first neural network codec to work on speech and music, while being able to run in real-time on a smartphone CPU.

The main technical ingredient of SoundStream is a neural network, consisting of an encoder, decoder and quantiser, all of which are trained end-to-end. The encoder converts the input audio stream into a coded signal, which is compressed using the quantizer and then converted back to audio using the decoder. SoundStream leverages state-of-the-art solutions in the field of neural audio synthesis to deliver audio at high perceptual quality, by training a discriminator that computes a combination of adversarial and reconstruction loss functions that induce the reconstructed audio to sound like the uncompressed original input. Once trained, the encoder and decoder can be run on separate clients to efficiently transmit high-quality audio over a network.

As per the Google AI Blog, SoundStream will be released as a part of the next, improved version of Lyra. By integrating SoundStream with Lyra, developers can leverage the existing Lyra APIs and tools for their work, providing both flexibility and better sound quality. It will also be released as a separate TensorFlow model for experimentation.

Want to publish your content?

Publish an article and share your insights to the world.

IndiaAI Recommends