Machine learning leveraging audio

Pillars
IndiaAI Portal
Resources
Ecosystem
Sectors

Back

Results for ""

IndiaAI Recommends

Sensor data, video footage, and images are commonly utilized to anticipate machine and mechanical system failure, but audio remains a system that has yet to be widely adopted. Despite experienced craftsmen using their auditory sense to diagnose faults, audio has not attained the same level of popularity as images. Car mechanics use the engine sound to identify the problem, senior engineers use audio to identify problems with machinery. This raises the question of why audio has not found the same level of adoption and whether this is changing.

What is the difference?

One fundamental difference between audio and other forms of data is that audio is analog in nature while the others are digital. Analog signals are continuous, varying over time or space, such as sound waves, voltage signals, or mechanical movement. Analog signals are represented as a continuous waveform, and their values can take on any value within a certain range.

In contrast, digital signals are discrete, represented using binary code composed of combinations of 0s and 1s. Digital signals are used to represent data such as text, images, and videos.

How to work with Analog Signals?

Working with analog signals can be a challenging task, as computers are designed to work with digital signals. Analog signals are continuous and vary over time or space, whereas digital signals are discrete and are represented using binary code. In order to convert analog signals to digital, we use a process called Analog-to-Digital Conversion (ADC). This process involves techniques such as sampling and quantization.

Properties

When it comes to audio signals, they have two types of properties:

Time domain properties
Frequency domain properties.

Time domain properties refer to the characteristics of a signal as it varies over time. This means that we plot the signal as a function of time, with amplitude or another measure of the signal's value on the vertical axis, and time on the horizontal axis. On the other hand, frequency domain properties refer to the characteristics of a signal in terms of its frequency content. In the frequency domain, a signal is decomposed into its constituent frequencies, and the amplitude of each frequency component is plotted against its frequency.

How to perform predictions on Sound?

To perform predictions on sound, we use time domain features such as amplitude envelope, root mean square energy, zero crossing rate and more. However, the results using just these features are not very promising. In order to improve the accuracy of our predictions, we need to use a more complex approach.

We use the Fourier Transform to move from the time domain to the frequency domain. Using the Short-Time Fourier Transform (STFT), we can create a spectrogram. A spectrogram is a visual representation of the frequencies of a signal over time. Recent work has found that treating spectrograms as images and feeding them into a Convolutional Neural Network (CNN). have very promising results.

Case Studies

OneWatt using audio to diagnose motor health. (https://thenextweb.com/news/machine-failure-predictive-analytics)
Skoda app uses audio to diagnose engine faults (https://www.theengineer.co.uk/content/news/skoda-app-diagnoses-faults-by-listening-to-engine-noise/)

Challenges

High level of hardware dependence involved in working with audio signals.
Difficulty in standardizing and interpreting machine-generated sounds.
Real-world sound recordings are also notoriously noisy, which can present a significant obstacle when training machine learning models.

Despite these challenges, the use of audio data is slowly gaining popularity. As the field of machine learning and signal processing continues to develop, new techniques for analysing and interpreting audio data will continue to emerge. In the future, we may see audio data become a crucial tool for predicting machine failure and maintenance and one that is used alongside other forms of data.

Sources of Article

https://en.wikipedia.org/wiki/Short-time_Fourier_transform, https://en.wikipedia.org/wiki/Fourier_transform, https://en.wikipedia.org/wiki/Envelope_(waves), https://en.wikipedia.org/wiki/Quantization_(signal_processing), https://en.wikipedia.org/wiki/Zero-crossing_rate#:~:text=The%20zero%2Dcrossing%20rate%20(ZCR,feature%20to%20classify%20percussive%20sounds.&text=is%20an%20indicator%20function., https://en.wikipedia.org/wiki/Sampling_(signal_processing)#:~:text=In%20signal%20processing%2C%20sampling%20is,a%20sequence%20of%20%22samples%22., https://thenextweb.com/news/machine-failure-predictive-analytics, https://www.theengineer.co.uk/content/news/skoda-app-diagnoses-faults-by-listening-to-engine-noise/,

Want to publish your content?

Publish an article and share your insights to the world.

IndiaAI Recommends

User Submission - Machine learning leveraging audio

Sources of Article

Want to publish your content?

ALSO EXPLORE