Bidisha Sharma is a lead speech scientist at Uniphore. 

Bidisha was a post-doctoral research fellow at the Electrical and Computer Engineering Department, National University of Singapore, Singapore. 

Her research focuses on voice processing, specifically text-to-speech synthesis, speech enhancement, recognition, and language processing. Additionally, she has experience with music and singing voice technology applications.

INDIAai interviewed Bidisha to get her perspective on AI.

The global Speech-to-text API market will reach $5.8 billion by 2027, expanding at a CAGR of 19.0 per cent during the forecast period. So what role will Indian industries and research play in the future?

I view there is a considerable role in India's global speech-to-text API market from different perspectives. India's academic and business sectors know each other well regarding talent and computer support. This approach helps build skilled workers and move research and innovation forward on the global stage. Besides Hindi and English being official languages in India, we have regional languages across all the states, and even multiple languages are in different regions under the same condition. With the advent and rapid growth of mobile devices and other interactive platforms, it is now critical to develop the technologies in different languages to make speech-to-text API reach a diverse mass of end-users in India. Indian industries and academia are actively progressing the speech-to-text API development and research in Indian and global languages.

What inspired you to research text-to-speech synthesis? Who is your role model?

I started working in text-to-speech synthesis after my BTech in 2012, and a year later, I joined the PhD program in the same area at IIT Guwahati. Initially, I worked on a project to develop the text-to-speech synthesis system in the Assamese language, the regional language of Assam, where I was born and brought up. Apart from my strong interest in speech and signal processing, I was incredibly motivated to develop the technology in my native language. Regarding text-to-speech synthesis research during my PhD, I followed the works of Alan W Black, Simon King, and many other groups around the globe.

Can you describe your doctoral research on text-to-speech synthesis systems?

In my doctoral research, I put effort into overcoming the bridge between the human voice and synthesized voice obtained from a text-to-speech synthesis system. I used my analysis and knowledge of acoustic-phonetic cues to make the synthesized speech sound more natural, understandable and appealing to the listener. My doctoral thesis presents an extensive study of Sonorant sounds, thereby extracting the Sonority features from the speech signal. Later, I used this Sonority notion of acoustic-phonetic study in different speech-based applications. I also used other statistical models during my PhD, from traditional machine learning models to various neural networks. You can find my PhD thesis here.

In your opinion, what are the disadvantages of Indian research and researchers in the text-to-speech synthesis system?

India has a good resource of talented individuals resulting in rapid growth in research and development in the past years from both academia and industry perspectives. To further facilitate research in AI and other domains, it is essential to boost scientific training and sufficient interaction between academia, industry, and government research organizations. To attract more talented individuals to pursue a career in research, attractive scholarships and other facilities such as attending conferences, workshops, internships, and collaboration with world-class universities are essential. Furthermore, pursuing a PhD is a long-term process, so the scholarship should be good enough for the candidate to support family expenses.

What challenges did you encounter when developing Text-to-Speech synthesis systems for the Assamese language? How did you overcome it?

Being born and brought up in Assam, I was very enthusiastic about developing a text-to-speech synthesis system in the Assamese language, which was then an underdeveloped language with very few available resources on the web. Regarding the challenges, in 2012, there was no publicly available data to develop the technology in the Assamese language. We had to create a new studio-quality database for different speakers in different domains, starting from audio recording, manual processing, transcribing, and making the raw data useful for the task. As an engineering background, I had to analyze and understand the Assamese language's spoken syllable structure and other phonetic aspects. Although it was a pretty rich learning experience, at the same time, it was challenging to focus on several directions concurrently.

Some researchers would find it hard to determine the research gap. How did you figure out your research problem?

In the early stage of research, it is critical to find a research problem that is original and innovative. Reading a lot about the literature in the area you want to learn more about is essential. By reading top journals and conference papers, analyzing published results, and critically thinking about existing literature, one gets involved with recent works and comes up with ideas that haven't been thought of before. It would help if you found out what problems are already there but not solved yet from the research literature. Talking with supervisors and experts in the field is also an excellent way to learn more about the research gap. During the initial phase of my PhD, I did a literature review. I discussed my thoughts with my supervisor Prof S R Mahadeva Prasanna, who helped me learn and focus on the particular direction to find my research problem.

According to you, what is inevitable in doing research in AI?

AI research is an essentially multi-disciplinary activity. Extensive knowledge of machine learning, statistics, and probabilities, along with a passion for a particular research area, is critical to doing research in AI. A good study of the basics of the field of interest, such as speech, image, and other signals, is always helpful to dig into the problems more effectively and find AI-based solutions. Fluent skills in high-level programming languages like Python and tools such as Pytorch and Tensorflow are also essential to implement ideas more efficiently.

What advice will you give to those who aspire to pursue a career in artificial intelligence?

To start pursuing a career in AI, you should be interested in machine learning, probabilities, and statistics with domain-specific knowledge. Strong knowledge of algorithms and their respective frameworks helps build AI models and implement machine learning processes more efficiently. Along with these, it is critical to know Python programming and try to code simple baselines to get a feel for the problem. Finally, spend a lot of time reading recent research and thinking about what the current methods don't do well enough. Working in AI is a continuous and progressive learning process, advancing yourself with the latest works published by experts worldwide, implementing them, and analyzing the results to help generate new ideas.

Could you say some important research articles and books which inspired you?

Several conferences (AAAI, ICASSP, ASRU, Interspeech, SLT & others) and journal articles publish new ideas and solutions. Following the research publications immensely help us to bring new solutions and progress in AI research. Among books to learn basic speech processing, I followed Introduction to Digital Speech Processing by Rabiner and Schafer and Fundamentals of Speech Recognition by Rabiner. Learning machine learning by following the book Pattern Recognition and Machine Learning by Christopher Bishop is extremely helpful. Apart from these, there are practical online courses on Coursera, LinkedIn Learning, and other digital platforms.

Want to publish your content?

Publish an article and share your insights to the world.

ALSO EXPLORE

DISCLAIMER

The information provided on this page has been procured through secondary sources. In case you would like to suggest any update, please write to us at support.ai@mail.nasscom.in