With the goal of breaking down language barriers to help people connect and better understand the world around them, Google Translate is always applying the latest technologies so more people can access the tool. In 2022, they added around 24 new languages, using Zero-Shot Machine Translation, where a machine learning model learns to translate into another language without ever seeing an example. 

They have also announced the 1,000 Languages Initiative, a commitment to building AI models that will support the 1,000 most spoken languages worldwide. 

Recently, Google Translate announced the expansion of a variety of languages it supports using PaLM 2 LLM, which utilises AI to learn new languages, particularly those related to regional dialects. 

They have rolled out 110 new languages to Google Translate, which is their most significant expansion ever. Among the newly supported languages are 7 Indian regional dialects such as Awadhi, Bodo, Khasi, Kokborok, Marwadi, Santali, and Tulu.  

Bridging the language gap  

From Cantonese to Qʼeqchiʼ, these new languages represent more than 614 million speakers, opening up translations for around 8% of the world’s population. Some are major world languages with over 100 million speakers. Others are spoken by small communities of Indigenous people, and a few have almost no native speakers but active revitalisation efforts. About a quarter of the new languages come from Africa, representing Google Translate’s most significant expansion of African languages to date, including Fon, Kikongo, Luo, Ga, Swati, Venda and Wolof. 

Given below are some of the other newly supported languages in Google Translate: 

  • Afar is a tonal language spoken in Djibouti, Eritrea and Ethiopia. Of all the languages in this launch, Afar had the most volunteer community contributions. 
  • Cantonese has long been one of the most requested languages for Google Translate. Because Cantonese often overlaps with Mandarin in writing, finding data and training models is tricky. 
  • Manx is the Celtic language of the Isle of Man. It almost went extinct when its last native speaker died in 1974. But thanks to an island-wide revival movement, there are now thousands of speakers. 
  • NKo is a standardised form of the West African Manding languages that unifies many dialects into a common language. Its unique alphabet was invented in 1949, and it has an active research community that develops resources and technology for it today. 
  • Punjabi (Shahmukhi) is a variety of Punjabi written in Perso-Arabic script (Shahmukhi), and it is the most spoken language in Pakistan. 
  • Tamazight (Amazigh) is a Berber language spoken across North Africa. Although there are many dialects, the written form is generally mutually understandable. It’s written in Latin and Tifinagh scripts, both of which Google Translate supports. 
  • Tok Pisin is an English-based creole and the lingua franca of Papua New Guinea. If you speak English, try translating it into Tok Pisin — you might be able to understand the meaning! 

Google’s way of choosing language varieties 

According to a blog post by Google, there’s a lot to consider when adding new languages to Translate — everything from the varieties they offer to the specific spellings people use. 

Languages have immense variation: regional varieties, dialects, and different spelling standards. In fact, many languages have no one standard form, so it’s impossible to pick the “right” variety. Google’s approach has prioritised the most commonly used varieties of each language. For example, Romani is a language that has many dialects throughout Europe. Google’s models produce text closest to Southern Vlax Romani, a commonly used variety online. But it also mixes in elements from others, like Northern Vlax and Balkan Romani. 

PaLM 2 was a crucial piece to the puzzle, helping Translate more efficiently learn languages that are closely related to each other, including languages close to Hindi, like Awadhi and Marwadi, and French creoles like Seychellois Creole and Mauritian Creole, says Google. As technology advances and Google continues to partner with expert linguists and native speakers, Google ensures that it will support even more language varieties and spelling conventions over time. 

Sources of Article

Source: Google

Want to publish your content?

Publish an article and share your insights to the world.

Get Published Icon
ALSO EXPLORE