Language modelling is used in a variety of applications, including machine translation, speech recognition, question answering, and sentiment analysis. In essence, the language model is comparable to any language's grammar in that it predicts the likelihood of the following positive word. It is a crucial component of current natural language processing.

When designing natural language processing (NLP) applications, language models are crucial. Developing complex NLP language models from scratch, on the other hand, takes time. From Open AI GPT-3; Switch Transfer, GLAM, PALM from Google; Turing NLG from Microsoft; Gopher from DeepMind to Jurassic from AI21 Labs - the recent evolution of language models is amazing.

However, most of these language models are not open-sourced. How does it make a difference? Well, It’s important more than ever before to understand the ‘how’ and ‘what’ behind the working of these large language models. Accessibility ensures work towards improving the robustness of these systems and mitigating existing challenges, including toxicity, bias and more. For a more detailed analysis of how open-sourcing language models will further propel the industry forward in a much more positive manner, do read here.

So, we have curated a list of the top three open-sourced language models:

1| OPT from Meta AI

Recently, the Meta AI team shared access to its OPT LLM with the scientific and academic research community. In addition, the team released both the pre-trained models and the code required to train and use them, which is a first for a language technology system of this magnitude.

“In line with Meta AI’s commitment to open science, we are sharing Open Pretrained Transformer (OPT-175B), a language model with 175 billion parameters trained on publicly available data sets, to allow for more community engagement in understanding this foundational new technology. Access to the model will be granted to academic researchers; those affiliated with organizations in government, civil society, and academia; along with industry research laboratories around the world,” Meta AI said in a blog.

As per the team, it has built OPT-175B with energy efficiency in mind by successfully training a model of this size using only 1/7th of the carbon footprint as that of GPT-3. This was achieved by combining Meta’s open-source Fully Sharded Data-Parallel (FSDP) API and NVIDIA’s tensor parallel abstraction within Megatron-LM. “We achieved ~147 TFLOP/s/GPU utilization on NVIDIA’s 80 GB A100 GPUs, roughly 17 per cent higher than published by NVIDIA researchers on similar hardware.”

2| Switch Transformer from Google Brain

The team of researchers at Google Brain, last year, open-sourced the Switch Transformer - natural-language processing (NLP) AI model. The model scales up to 1.6T parameters and improves training time up to 7x compared to the T5 NLP model, with comparable accuracy.

The Switch Transformer, as per the paper published or arXiv, uses a mixture-of-experts (MoE) paradigm to combine several Transformer attention blocks. “Switch Transformers are scalable and effective natural language learners. We simplify Mixture of Experts to produce an architecture that is easy to understand, stable to train and vastly more sample efficient than equivalently-sized dense models. We find that these models excel across a diverse set of natural language tasks and in different training regimes, including pre-training, fine-tuning and multi-task training,” as per the paper.

As only a portion of the model is needed to process each input, the number of model parameters can be raised while maintaining the same computing cost. Moreover, baseline versions of the Switch Transformer can reach goal pre-training perplexity metrics in 1/7 the training time of Google's state-of-the-art T5 NLP model. Also, the 1.6T-parameter version outperforms a T5-XXL on the perplexity metric, with comparable or better performance on downstream NLP tasks, despite training on half the data.

3| GPT-NeoX from EleutherAI

The buzz in the AI world started as EleutherAI open-sourced its large language model (LLM) GPT-NeoX-20B. It consists of 20 billion parameters. Built on a Coreweave GPU, the language model comes pre-trained with the GPT-Neox framework. Built on the cluster of 96 state-of-the-art NVIDIA A100 Tensor core GPUs for distributed training, the GPT-NeoX-20B performs very well when compared to its counterparts that are available for public access.

The researchers published their paper on arXiv and stated: We release GPT-NeoX-20B, motivated by the belief that open access to LLMs is critical to advancing research in a wide range of areas—particularly in AI safety, mechanistic interpretability, and the study of how LLM capabilities to scale. Many of the most interesting capabilities of LLMs only emerge above a certain number of parameters, and they have many properties that simply cannot be studied in smaller models.”

Read our other stories on language models:

Want to publish your content?

Publish an article and share your insights to the world.

Get Published Icon
ALSO EXPLORE