India is a melting pot of multiple cultures, religions, diaspora and languages. Although 22 languages are recognised officially, more than 100 languages and dialects are spoken across the country. In the past decade, India has witnessed stupendous growth digitally - in 2019, the number of smartphone users in rural areas surpassed that of urban India. There is a burgeoning market for digital products, going well beyond borders of urban pockets. However, less than 1% of content on the Internet is in English. The 500 million odd smartphone users in India (and counting) are starved for content in languages they speak and understand, and English is not one of them. 

This is a unique opportunity for India to build a distinct Indic language translation platform, related services and products, by leveraging the power of AI. And this is the essence of the National Language Translation Mission (NLTM). Announced during the Union Budget of 2021-22, the NLTM is expected to enable the wealth of governance-and-policy related knowledge on the Internet being made available in major Indian languages. 

At the ongoing NLP Week, Manish Gupta, Director of Google Research India spoke about the impending opportunity for India's AI fraternity to democratise information in any language. "Be it a labourer's son in Chattisgarh or a farmer's daughter in rural Rajasthan, anyone can access vital information online in a language of their choice," he said. With heightened smartphone proliferation, availability of 5G, expanding WiFi services in villages and overall digital literacy, India has an unprecedented opportunity to create a blueprint for building the Internet for local languages, added Professor Rajeev Sangal, renowned researcher in machine translation and AI. This will provide an immense boost to the country's R&D capabilities as well, he noted. 

Currently, speech recognition and voice-based technologies are in their infancy. Most language data exists in text format, and so far, expansive work has been done in developing text-to-text datasets. Speech datasets are still few in number. In addition, challenges exist in data labelling across languages, decoding semantics and syntax. There is a wide ranging potential for AI, ML, OCR, NLP and other technologies to make speech recognition sharper and error-free. Moreover, a lot of these areas are unique to India given the complexity and range in languages, which also lends Indian researchers and AI scientists a distinct opportunity to create a novel language translation system, which can then be adapted for foreign languages, explained Prof. Sangal. 

In the next 3-4 years, the NLTM hopes to build automatic speech recognition, rapid machine translation and Text-To-Speech (TTS) with accurate intonations. Eventually, the aim is to apply these developments to all Indian languages (beyond the 22 official languages). This requires a concentrated and coordinated effort that should include startups, researchers, scientists, academia and industry, added Prof. Sangal. The expansive cross sector participation will encourage disbursal of government funds, especially for "challenge rounds" where startups and researchers can solve particular use cases. 

"The NLTM is addressing a truly unique issue of language uniformity on the Internet, which has grown unilaterally all these years and favoured a small section of people. Through this initiative, India can emerge a leader in Speech-to-Speech Translation services and innovation, and can inspire other countries to embark on similar missions to further their native languages," added Prof. Sangal.

Want to publish your content?

Publish an article and share your insights to the world.

ALSO EXPLORE

DISCLAIMER

The information provided on this page has been procured through secondary sources. In case you would like to suggest any update, please write to us at support.ai@mail.nasscom.in