In a country like India that is diverse in its very ethos, the importance of communicating with clarity and accuracy is critical. The Constitution of India recognizes 22 official languages — and although English is spoken in certain quarters, vernacular communication is still the order of the day. This isn’t limited to conversations offline, but also on social media platforms, where users create and consume content. 

With new-age technologies like AI and ML ruling the roost, most of the challenges that exist (even the minutest ones) are slowly and steadily being addressed. In fact, Koo — India’s very own microblogging platform offers an opportunity to its users to express their opinions and speak about trending subjects. But what makes it stand out is how Koo is trying to provide its users with an immersive language experience. 

To understand this better, the Research & Content Lead of INDIAai, Jibu Elias, caught up with Phaneesh Gururaj, President – Technology, Koo. Gururaj leads the engineering and innovation team at the company. Before joining Koo, Gururaj was at the forefront of the Engineering and Product teams at redBus.

The workings of NLP

In a candid chat, Gururaj shared that Koo’s vision has always been to connect people, so that they are able to express their thoughts and opinions, without any language barrier. Although users were accustomed to certain products that were developed by the western world, it was clear that the focus was on the English-speaking crowd. 

“With the Jio revolution that has happened, there’s a huge set of people embracing social media’s day-to-day usage. These users have certain hurdles, where they were not so well-versed in using things as a tech person would do, and understand the overall discussions that are going on. So, how do people participate? It’s about removing the facade and putting a skin that they can connect with. In a broad aspect, that’s nothing but the language,” explains Gururaj.

After eliminating the language barrier, it has been much easier for people to embrace social media, in order to seamlessly offer their thoughts and opinions. 

Of course, this subject can’t be detached from technology. Gururaj says that although NLP has existed for a while, it is more democratic today. 

“Things are available at a large scale; you have much higher machines and stronger algorithms, which can do NLP in a much faster way. We have embraced some of these latest trends that are happening on the NLP side, and we have been able to make them available to our end users,” he adds. 

The basket of languages

Koo offers Kannada, Nepali, Assamese, Telugu, Tamil, Marathi, and a lot of other popular languages that are part of the Hindi heartland and southern part of India. The platform has been able to accelerate the adoption of these languages with a playbook because to enable the first language end-to-end takes some time.  

“You need to know the different nuances of each of these languages. They have their own grammar and set of things that are quite unique. Most Indians are bi-lingual, they know most of the languages, so they tend to mix them in their thoughts and opinions too,” shares Gururaj. 

Furthermore, the microblogging platform makes use of open source as well as their own techniques to do translations that they call transliteration. 

“We try to understand any language that is out there, and we see how it can be converted into a common language for which there are quite a few libraries. Say if I want to convert Kannada to Malayalam, it’s generally Kannada to English, and then English to Malayalam. During this chain, what happens is that to a large extent it works, but the true meaning I want to convey is somewhere lost. We have tried to see how we can have a wrapper on top of these frameworks, and solve this problem,” he explains.

Lost in translation: A challenge?

Several NLP academicians believe that even though translating from English to an Indian language is not so easy, but doing it between two Indian languages is far tougher, largely because of the lack of language corpus to train models. 

But Koo has a solution for this, shares Gururaj. “To build any language model, you need structured and labeled data, or as we call in the ML world, corpus. The labeled data is not that easy to get for some of the long-tail languages. We are investing on our own to build some of these so that we have decent corpus data,” he adds. 

The first few models are likely to be a little weak, since there are only a set of few

languages, where data is labeled. 

“Once we learn from how the other languages are performing, we can incorporate some of the labels other people are talking about and try to include in other languages where there are gaps,” explains Gururaj. 

Koo has a small set-up called ‘Label Ops’ where all the content that comes on their platform, is labeled. It is a work in progress, and the company is leveraging various tools like Facebook or even Stanford NLP for assistance. 

Some other challenges

As a content platform, Koo wants to keep both sides of users happy — those who create and consume content. 

“For instance, if I am a creator and I know English, Hindi, and Kannada, and maybe I understand Tamil. We try and show the preview to the user (of the translation) so that we get a signal from the users to know if the translation was good or not, or even accurate. This comes as a feedback loop, where we try and improve the translation accuracy. When we understand the signal, it goes back to the labeling,” he shares, adding that it is also important to be cognizant about the time taken for these models to run, cost, as well as scale. 

Vocal for local

As a shining example of Atmanirbhar Bharat, Koo has advice for other companies. 

“There are so many problems if you look from the Bharat user perspective. You have to understand a day in the life of an Indian user. You have to dissect it, from the time he gets up to when he goes to sleep. There are so many aspects where you can add value,” says Gururaj. 

The future

The times ahead look very exciting for Koo, and Gururaj believes that technologies like AL, ML, and NLP will play a critical role, in defining the shape of the company. 

“More research and development (R & D) work needs to be done, of course. We need to see how language is impacting our metrics, we also want to understand the expectations of our users, what they like, what they want to consume, and how do we do things at a massive scale. it. We also want to see how language translation accuracy is improved, and eventually make it an open-source,” he concludes.

Want to publish your content?

Publish an article and share your insights to the world.

ALSO EXPLORE

DISCLAIMER

The information provided on this page has been procured through secondary sources. In case you would like to suggest any update, please write to us at support.ai@mail.nasscom.in