Results for ""
Sudip Bhattacharya has over 23 years of experience applying innovative technologies for business solutions.
He has worked extensively in cutting-edge technology across multiple countries, including India, the USA, France, Taiwan, and Japan.
INDIAai interviewed Sudip to get his perspective on AI.
In college, we were told to solve the Travelling Salesman Problem (TSP). In this problem, there are supposed to be n cities and visiting every city incurs a cost. The salesman must travel to all cities with minimal cumulative cost.
At first, it looked like an easy question, especially for 4/5 cities. However, the book mentioned a different and exciting way to solve using neural networks. That was my first step into building solutions with Neural Networks, which later led to a serious interest in AI. I am still solving that TSP - combinatorial optimization problem to find the best out of a finite set - the business application has evolved, and the scale become massive.
We speak the way our instinct and environment impact our dialect. Our expressions, voice, speech delivery, and vocal characteristics are unique. When we learn and speak other languages, our mother tongue or the vernacular dialect influences our pronunciation, tone and speed of speaking. The heterogeneity is non-trivial, whereby the pronunciation, tone, speed, loudness, vocabulary, and grammar change with every individual, across every few kilometres, geographical regions. My current project on Vernacular Speech is to create an automatic software system to evaluate the abstract characteristics of spoken vernacular language and infer the confidence, quality of speech delivery, fluency and proficiency.
The challenge is defining the "correct" reference and benchmark of test and training data for the AI model. However, it is a significant problem to be solved for creating global equitable access to systems irrespective of linguistic background. Our project involves explicitly at least 10 Indian languages to be analyzed at the level of speech to detect correctness for applied use cases like
To do this, we decided to train our models to understand the characteristics of vernacular speech instead of using the traditional method of fine-tuning an out-of-the-box model.
The project has been split into multiple segments:
Each segment has unearthed new avenues for exploring beyond speech-to-text and directly using speech as the raw input. We are still working to create a generic model catering to a large segment of Indian linguistic variety.
From what I hear while discussing with people beyond the Machine Learning Community, the most prevalent misconceptions are:
More than dispelling, empowering people with knowledge of AI and its limits will bring trust, endorsement and acceptance. Additionally, guardrails and regulatory transparency, once accessible to people, will enable us to unleash the powers of AI to improve our lives.
I can speak from the experience we are going through as we build one of the few direct speech ingestion and inference systems to detect characteristics of Indian Vernacular Languages.
However, we see various initiatives in the right direction from various stakeholders and look forward to a robust, comprehensive Vernacular Speech Data Set, ML Model, to emerge soon.
AI Research is no longer just an exciting area but is necessary for society's benefit. India's challenges, demographics and opportunities have immense scope to research and bring real AI applications to life.
I suggest or expect AI researchers, particularly in our country, to have the following: