NVIDIA Unveils Nemotron-4-Mini-Hindi-4B: AI for India's 500 Million Hindi Speakers

Pillars
IndiaAI Portal
Resources
Ecosystem
Sectors

Back

Results for ""

IndiaAI Recommends

Nvidia, the global leader in chip technology, has recently launched a lightweight artificial intelligence model for Hindi, India’s most widely spoken language. India has 22 constitutionally recognized languages and over 1,500 more recorded by the country’s census; only around 10% of its residents speak English, the internet’s most common language.

As India, the world’s most populous country, forges ahead with rapid digitalization efforts, its enterprises and local startups are developing multilingual AI models that enable more Indians to interact with technology in their primary language.

These projects are building language models for Indic languages and English that can power customer service AI agents for businesses, rapidly translate content to broaden access to information and enable services to reach a diverse population of over 1.4 billion individuals more easily.

According to an official statement, NVIDIA has released a small language model for Hindi, India’s most prevalent language, with over half a billion speakers, to support initiatives like these. Now available as an NVIDIA NIM microservice, the model, dubbed Nemotron-4-Mini-Hindi-4B, can be easily deployed on any NVIDIA GPU-accelerated system for optimized performance.

Tech Mahindra is the first to use the Nemotron Hindi NIM microservice to develop an AI model called Indus 2.0, which is focused on Hindi and dozens of its dialects, the company stated. Indus 2.0 harnesses Tech Mahindra’s high-quality fine-tuning data to further boost model accuracy, unlocking opportunities for clients in banking, education, healthcare and other industries to deliver localized services.

AI in Hindi, Made Easy Through NVIDIA NIM

“The Nemotron Hindi model has 4 billion parameters and is derived from Nemotron-4 15B, a 15-billion parameter multilingual language model developed by NVIDIA. The model was pruned, distilled and trained with a combination of real-world Hindi data, synthetic Hindi data and an equal amount of English data using NVIDIA NeMo, an end-to-end, cloud-native framework and suite of microservices for developing generative AI,” the company added.

The dataset was created with NVIDIA NeMo Curator, which improves generative AI model accuracy by processing high-quality multimodal data at scale for training and customization, NVIDIA said in their statement. According to the company, the NeMo Curator uses NVIDIA RAPIDS libraries to accelerate data processing pipelines on multi-node GPU systems, lowering processing time and total cost of ownership. It also provides pre-built pipelines and building blocks for synthetic data generation, data filtering, classification and deduplication to process high-quality data.

“After fine-tuning with NeMo, the final model leads on multiple accuracy benchmarks for AI models with up to 8 billion parameters. Packaged as a NIM microservice, it can be easily harnessed to support use cases across industries such as education, retail and healthcare,” NVIDIA stated.

According to the statement, the model is available as part of the NVIDIA AI Enterprise software platform. It gives businesses access to additional resources, including technical support and enterprise-grade security, to streamline AI development for production environments.

"Innovators, major enterprises and global systems integrators across India are building customized language models using NVIDIA NeMo. Companies in the NVIDIA Inception program for cutting-edge startups are using NeMo to develop AI models for several Indic languages," NVIDIA opined.

Sources of Article

NVIDIA

Unsplash

IndiaAI Recommends