Dr Neethu Mariam Joy is an AI Scientist at ZS, where she is actively working on natural language processing projects. She earned a PhD in Speaker Normalization, Neural Networks, and Automatic Speech Recognition at the Indian Institute of Technology, Madras.

She is particularly interested in the acoustic modelling of limited-resource languages. She is presently working on strategies for unsupervised speaker normalization using deep neural networks.

INDIAai spoke with Dr Neethu Mariyam recently to learn about her research path and her insights during her research.

What prompted you to pursue research in automatic speech recognition (ASR) after graduating with honours in electronics and communication engineering?

During my MTech days, my favourite subject was digital signal processing. So when I applied for a PhD in IIT-Madras, I wanted to further my knowledge in this field. In those times (back in 2011), an admission committee used to match a potential PhD student to the corresponding guide. I was thus allocated to Dr Umesh Srinivasan, whose research specialization was speech recognition, speaker normalization etc. I wasn't even aware that such a subdivision existed in signal processing. Right about this time, AI was having a revival and was quite promising in ASR. I was right in the middle of these exciting changes, where I got to dabble in both speech processing/recognition and machine learning.

Can you tell us a bit about your Speaker Normalization research and its potential in the future?

While building an ASR system, we collect data from many speakers with varying characteristics like age, gender, speaking rate, fluency, accents etc. Speaker normalization removes the variabilities caused by such speaker characteristics in ASR. However, this would require a considerable amount of speech data per speaker. During model training time, provided we know the identity of speakers, computation of such speaker normalization matrices or vectors is not an issue. But during test time, especially for IVR systems where test files are short, we have to treat each test file as an individual speaker. Calculating a speaker normalization matrix/vector, in this case, is not possible. Therefore, our research focused on finding ways to do speaker normalization in such scenarios.

What was the most challenging part of the transition from electronics and communication to AI research?

In my case, I was dealing with two new subjects: (1) speech processing/recognition and (2) machine learning. I had no prior knowledge or experience in both fields when I started my PhD. So, learning about the nuances in each domain individually and finding areas of overlap between both was a steep uphill climb for me.

Can you tell us more about your role as an AI scientist at ZS?

In 2020, ZS set up its AI Centre of Excellence division, and I work as an AI scientist in this division. At ZS, I learned and worked in Natural Language Processing (NLP). My research focuses on developing in-house tools for various NLP pipelines, which other ZSers can then use in client-based projects.

Is programming knowledge necessary for professionals interested in an artificial intelligence career?

The short answer is yes. Implementation of the concepts we learned goes hand in hand with the learning process. AI + speech/NLP/computer vision is an applied research field, so we need to have the means to implement the ideas we learned. Also, when we code out a concept we know, we gain more insights than what a pure textual based study would give us.

What advice would you give to students and professionals interested in pursuing a career in AI? Especially for professionals and students who do not have an IT background.

Your interest and perseverance will be the two traits that will help you here. It might seem daunting initially and might take you numerous iterations to understand a concept. If you cultivate an approach where you code/implement what little you have learned, that would keep the interest alive and push you to learn more. Even if you don't know to code, it's never too late to invest some time to learn a beginner-friendly language like Python. Get a rudimentary understanding of the programming language you are interested in, then try to implement the ML concept you learned with the knowledge you have now. With more iterations like this over time, you can see a remarkable improvement in your coding skills and expertise in AI concepts.

What, in your opinion, would be the near-future breakthroughs in Automatic Speech Recognition?

Often we isolate ASR from NLP and treat them as two separate modules. However, suppose you take a practical application where a system makes sense. In that case, it needs an ASR to translate what's spoken into text format and NLP to process this text into helpful information. Instead of isolating these as two separate tasks, we should be thinking of ways to merge these both into a single speech-language processing and understanding unit.

Can you recommend some AI books and research articles for people just getting started with AI?

For beginners, I would suggest Andrew Ng's Coursera materials. These are a good mix of theory and practical coding exercises that can give you a bird's eye view of machine learning. Deep Learning with Python by François Chollet and 



Hands-on Machine Learning with Scikit-Learn, Keras and TensorFlow by Aurélien Géron are exciting reads for beginners.

Want to publish your content?

Publish an article and share your insights to the world.

ALSO EXPLORE

DISCLAIMER

The information provided on this page has been procured through secondary sources. In case you would like to suggest any update, please write to us at support.ai@mail.nasscom.in