Results for ""
Speech recognition is one of the most fascinating topics in AI. It helps to translate the human language into text As we can see, speech recognition is used in many day to day applications, like in Banking, Healthcare, Marketing, IOT (Internet of Things) etc. Other examples includes Apple Siri, Amazon Alexa, Google Assistant etc.
System takes the speech (input) through audio file or microphone
It converts the physical sound into electrical signal
It convert the electrical signal into digital data with Analog -to-Digital converter
Once digitized ML model can be used to transcribed the audio into text
ML and Deep neural network models are used to convert the audio into text. Explanation of how the model works is beyond the scope of this article. In this article, I am explaining how to convert the speech into text using Python. I used Speech Recognition API and PyAudio library in Python to convert the speech into text
Speech Recognition API supports the following
For more details, please check the SpeechRecognition document. In this article, I used google speech recognition API.
Installation Speech Recognition and PyAudio Python libraries:
https://jovian.ml/sdhilip/untitled20/v/13&cellId=0
Embed code
Code:
https://jovian.ml/sdhilip/untitled20/v/2&cellId=1
(You can embed this code in the article, so the code will be clear)
I used Taken English movie audio clip dialogue (I-dont-know.wav file)
Output
Let’s try in some of our languages.
First I am trying with Tamil language, we don’t need to change the entire code. We need to just add the language option in the recogonize_google and change the audio file. Language options for Tamil is “ta-IN”, Hindi - “hi-IN”, Telugu - “te-IN”, Malayalam - “ml-IN”. For more details, please check the Speech Recognition document.
https://jovian.ml/sdhilip/untitled20/v/5&cellId=2
Output:
For Hindi:
https://jovian.ml/sdhilip/untitled20/v/8&cellId=2
Output
We have converted audio speech into text from the above code. How do we convert our speech using the microphone into text? In order to do that, we need to install the PyAudio library which helps to get the audio input through the Microphone and speaker.
The code is almost the same, only change is we need to use Microphone class instead of audio file source.
Code:
https://jovian.ml/sdhilip/untitled20/v/10&cellId=7
I talked : “Corona changed the world completely”
Output
In Tamil:
We just need to add the language option for Tamil “ta-IN” same like audio file
I talked “Welcome, How are you” in Tamil and it exactly translates
https://jovian.ml/sdhilip/untitled20/v/13&cellId=11
Output
In Hindi
https://jovian.ml/sdhilip/untitled20/v/12&cellId=10
I talked in Hindi “What is your name”
Output
In Telugu
https://jovian.ml/sdhilip/untitled20/v/13&cellId=8
I talked “How are you” in Telugu
Output
In Malayalam
https://jovian.ml/sdhilip/untitled20/v/12&cellId=9
I talked “Where are you from”
Output
This is one of the simplest methods to convert speech into text using google speech recognition API. This is very useful for NLP projects. Also, please note Google speech recognition API requires an internet connection to operate. Please try with other languages and explore.
Reference:
Image Source: Loginworks.com