Get featured on IndiaAI

Contribute your expertise or opinions and become part of the ecosystem!

Tech Mahindra introduces Project Indus, an AI-driven solution catering to India's vast population of Hindi speakers.

According to reports, the Indus Project is a "Civilizational" program aimed at empowering all Indic languages descended from the great Indus Civilization. The organization seeks to create an Open Source Large Language AI model that can satisfy the demands of 25% of the world's population!

Indian Language models

Language technology has added significance in a country as linguistically and culturally varied as India, where many languages and dialects are spoken. Natural language processing (NLP) and speech recognition allow machines to understand and interact with human texts and speech. Improvements in the field could affect the lives of billions of people because most of their communication still takes place in their native tongue.

The use of machines to translate Indian languages dates back to the 1980s. Improvements in deep learning, computational capacity, and the advent of a nationwide unifying effort in Bhashini have recently accelerated artificial intelligence (AI) work on Indian languages. Similarly, AI4Bharat is an open-source community of engineers, domain experts, policymakers, and academics working together to develop AI solutions to address India's most pressing social, economic, and environmental issues.

Central Institute of Indian Languages

The Central Institute of Indian Languages maintains a corpus of over 3.5 million words in various major Indian languages. The same will be expanded to include 25 million words in each language. Furthermore, the existing corpora are raw corpora that will be cleaned before use. The development of corpora in these languages will aid in comparing and contrasting the structure and operation of Indian languages. As a result, at least 100 minor language corpora will be gathered, totalling 3 to 5 million words in each language, depending on the availability of text for the purpose. 

Project Indus

India's continued success in the face of linguistic and cultural diversity is a monument to the breadth and depth of human culture. Over 1,600 distinct languages are spoken there, attesting to the rich linguistic diversity that thrives across the country's enormous expanses. 

India's many languages, from Hindi to Tamil, Bengali to Gujarati, contribute their unique colour to its thriving cultural tapestry. This linguistic tapestry celebrates the variety of voices and the rich history constituting India's exceptional cultural heritage, fostering a strong sense of unity.

In the initial phase, the research team intends to encompass 40 Hindi dialects within the scope of the Indus project. Additional dialects will be incorporated in the future.

Additionally, the research team kindly requests the public to contribute their expressions, terminology, and dialogues. It would assist in the development of India's most extensive domestically produced LLM program.

Want to publish your content?

Publish an article and share your insights to the world.

Get Published Icon
ALSO EXPLORE