Sudip Bhattacharya has over 23 years of experience applying innovative technologies for business solutions.

He has worked extensively in cutting-edge technology across multiple countries, including India, the USA, France, Taiwan, and Japan.

INDIAai interviewed Sudip to get his perspective on AI.

What initially sparked your interest in AI?

In college, we were told to solve the Travelling Salesman Problem (TSP). In this problem, there are supposed to be n cities and visiting every city incurs a cost. The salesman must travel to all cities with minimal cumulative cost.

At first, it looked like an easy question, especially for 4/5 cities. However, the book mentioned a different and exciting way to solve using neural networks. That was my first step into building solutions with Neural Networks, which later led to a serious interest in AI. I am still solving that TSP - combinatorial optimization problem to find the best out of a finite set - the business application has evolved, and the scale become massive.

Tell us about your Vernacular Speech AI project.

We speak the way our instinct and environment impact our dialect. Our expressions, voice, speech delivery, and vocal characteristics are unique. When we learn and speak other languages, our mother tongue or the vernacular dialect influences our pronunciation, tone and speed of speaking. The heterogeneity is non-trivial, whereby the pronunciation, tone, speed, loudness, vocabulary, and grammar change with every individual, across every few kilometres, geographical regions. My current project on Vernacular Speech is to create an automatic software system to evaluate the abstract characteristics of spoken vernacular language and infer the confidence, quality of speech delivery, fluency and proficiency.

The challenge is defining the "correct" reference and benchmark of test and training data for the AI model. However, it is a significant problem to be solved for creating global equitable access to systems irrespective of linguistic background. Our project involves explicitly at least 10 Indian languages to be analyzed at the level of speech to detect correctness for applied use cases like 

  • Communication skills required for job interviews
  • Customer interaction
  • Situational response maturity
  • Unbiased local language proficiency check

To do this, we decided to train our models to understand the characteristics of vernacular speech instead of using the traditional method of fine-tuning an out-of-the-box model.

The project has been split into multiple segments:

  • Training and test Data Collection. Public data with Indian language recording and enough duration is difficult to get.
  • Feature extraction 
  • Model Algorithms and Training using Deep Neural Network
  • Model Test and Evaluation
  • Deployment and continuous feedback

Each segment has unearthed new avenues for exploring beyond speech-to-text and directly using speech as the raw input. We are still working to create a generic model catering to a large segment of Indian linguistic variety.

What are the most prevalent misconceptions you'd like to dispel as a long-time AI and machine learning community member?

From what I hear while discussing with people beyond the Machine Learning Community, the most prevalent misconceptions are:

  • AI will take away many jobs
  • AI may soon be like a rogue robot as in science fiction movies

More than dispelling, empowering people with knowledge of AI and its limits will bring trust, endorsement and acceptance. Additionally, guardrails and regulatory transparency, once accessible to people, will enable us to unleash the powers of AI to improve our lives.

What is your opinion on vernacular speech challenges for India's complex linguistic structure?

I can speak from the experience we are going through as we build one of the few direct speech ingestion and inference systems to detect characteristics of Indian Vernacular Languages.

  • The Indian context is almost like many countries' socio-culture, diversity, and linguistics bundled into one. Creating a close "Correct" model for all Indian languages is complex.
  • More than collecting and annotating speech from Youtube, Social Media or the few public data sets were required.
  • A comprehensive collection of large data sets to train machine learning models is not easy to get/create
  • Biases in training data annotations are difficult to manage.
  • Limited pockets of research are available/accessible for Indian Vernacular Speech.

However, we see various initiatives in the right direction from various stakeholders and look forward to a robust, comprehensive Vernacular Speech Data Set, ML Model, to emerge soon.

What advice would you provide to someone considering a career in AI research? What should they concentrate on to advance?

AI Research is no longer just an exciting area but is necessary for society's benefit. India's challenges, demographics and opportunities have immense scope to research and bring real AI applications to life.

I suggest or expect AI researchers, particularly in our country, to have the following:

  • Build a sound foundation in Mathematics and Computer Science
  • Collaboration with real-life application builders (both companies and individuals) 
  • Prioritize research topics to tangibly and positively impact today's life

Want to publish your content?

Publish an article and share your insights to the world.

ALSO EXPLORE

DISCLAIMER

The information provided on this page has been procured through secondary sources. In case you would like to suggest any update, please write to us at support.ai@mail.nasscom.in