While Artificial Intelligence (AI) may have made strides over time, it is critical to pause and reflect on how India, as a nation, has made progress in dealing with socio-economic challenges. There are several efforts being taken to make India digitally inclusive, but even today, one of the barriers remains accessibility to infrastructure – including the internet. 

Moreover, only 10% of the country speaks and understands English – the majority communicate in hundreds of different languages. Fortunately, the reliance on voice-enabled technology and vernacular languages are likely to bridge the gap.

As part of INDIAai's LinkedIn live, Jibu Elias – Content Head at INDIAai spoke with Vivekanand Pani – Co-founder at Reverie Language Technologies, and Harsh Singhal – Head of ML at Koo India to understand why AI-powered language tech is crucial to empower citizens, the challenges in the journey, and what the future holds. 

The significance of AI-powered language tech

Kickstarting the conversation, Pani recalled how when Reverie started out, most devices didn't even have the capabilities to display Indian language characters. Initially, there were also several questions – if those who cannot read English will be willing to pay for it. 

"We were convinced that 95% of Indians, who are not comfortable with English can't even recognise the letters, so how will they be able to use the digital medium? We can't avoid the move towards the adoption of technology, but these people will not be able to manage without their languages. Currently, there are several groups that are working towards building data and these are critical in making better solutions," he shared. 

On the other hand, Singhal who is with Koo, believes that the microblogging and social networking platform has always understood that ensuring access to information in local languages is paramount. That's exactly how they also built Vokal, India's first voice-enabled question-answer platform that has received traction across Indian languages. 

"Koo was born out of that sentiment, where people could express themselves in text and multimedia. The whole idea was that the western walled gardens of social media products don't allow Indians or even native language speakers to access a fully immersive experience," explained Singhal. 

Leading with technology for solutions 

The Koo app brings together a collection of technologies that are built through social media or microblogging, or expression of opinion. Singhal believes that there are multiple features available on the platform, using which users can translate content into the supported languages. 

"We have done the most challenging piece of finding content in different languages and separating it out into various categories, subjects, and interest areas. This tells people that they don't have to come to the platform and hunt. You can get what you want just by letting us what you are interested in and we will surface it for you," shared Singhal, adding that it's not just content; Koo also facilitates connection with leaders across fields, who are looked at as role models. 

Recounting the challenges that Reverie faced during its initial days, Pani shared that even though the internet came to India in 1995, there were no phones, tablets, or easy devices with a primary connection. In fact, these did not support Indian languages, until two decades later. 

"If a device does not even have the ability to let you type your language, there's no way you can engage on the internet in a language of your choice. Most of these Indian languages today are low-resource languages. It would have been so much better had the operating system architecture kept languages out of it, so that we could have added several more," said Pani. 

This is exactly why Reverie has created technology that can enable the operating systems, so that more device makers can adopt this. Post that, they began working on cloud services, where language as a service can dynamically provide multilingual support. By 2016-17, they started releasing their voice APIs, so we have they have the entire stack of Indian languages. 

Addressing the need for data for machine learning 

Koo, Singhal believes, is in a privileged position, since they are sitting on a lot of data. In fact, he revealed that they don't face any challenges in gaining access to data, but the roadblock lies in the absence of state-of-the-art capabilities for machine learning for Indian languages. 

"For instance, in the case of Hindi, there's a lot of information out there and we leverage many technologies, including monolingual embeddings for Hindi, or multilingual embeddings that large social media companies, or even AI for Bharat have been providing as open source tech. We understand our users and content better than anyone else, which gives us an advantage in trying to structure what we want to extract from these pieces of content," shared Singhal. 

Although Reverie has been a pioneer in the language tech space, they have faced several challenges as first movers. For instance, even if they wanted to build data, there was no access to the publicly available information. The only way to create data is by paying, and even if one pays, will they be able to facilitate the tools? 

"If a computer doesn't allow you to type in a local language, you have to facilitate through alternate tools. So, we faced challenges in even building the basic blocks. But when you look at data available today, Indian languages are not as trivial as English usage on the internet; people write a local language using English, so that would be phonetic English," he shared. 

For a local language, one would not be able to identify from a large amount of data that you would have collected from different sources. There is a need to create some models to identify and automate those kinds of things, revealed Pani, adding that most Indian language users do not know what some letters mean, which makes people end up typing and creating noisy text. 

Solving the crisis

Another common challenge is that the research that becomes available tends to use English language data to publish. Pani explained that there are significant property differences between English and other languages, which is why the algorithms and maths do not coherently agree. 

But is there any substantial solution to resolve the crisis? Singhal shared that the focus should be on building datasets and benchmarks. Citing an example, he spoke about how the Samanantar corpus creates parallel sentences between multiple Indian languages, allowing one to translate from Indic languages to English, and between Indic languages.

"The challenge is to apply those approaches to various areas. Translating these sentences, and paragraphs and building these parallel corpora is not easy, but with initiatives like AI for Bharat, it's a clarion call to the industry to build these data sets and contribute to creating these fundamental blocks," he added. 

A promising future 

Although it is hard to pinpoint what's next, Pani shared that there are several languages in India that users speak. That's exactly why they are working towards capturing the users' engagement behaviour, how they communicate, and similar parameters. Their vision is to keep making the interaction piece as fluid as possible. 

"We will continue to delight our users. Currently, we are witnessing an uptick in the growth of our product features across languages. Our aim is to continue to add more languages and the challenges will remain the same. One of the other challenges is to keep up-to-date with what people are talking about, what are the new subjects people like to discuss, and more – this will always keep us on our toes," concluded Singhal.

Want to publish your content?

Publish an article and share your insights to the world.

Get Published Icon
ALSO EXPLORE