The fashion of coming out with large language models has seen a tremendous rise over the years. From Jurassic-1 Jumbo from Israeli firm AI21 Labs (178 billion parameters), OpenAI’s GPT-3 (175 billion parameters), to Meta AI’s Open Pretrained Transformer (OPT-175B) with 175 billion parameters - we need to ask if it’s worth to have LLMs or are we missing something?

“Trying to build intelligent machines by scaling up language models is like building a high-altitude aeroplane to go to the moon. You might beat altitude records, but going to the moon will require a completely different approach,” said Yann LeCun in an earlier post.

A glimpse of that “different approach” can be sensed with the Meta AI’s recent move, where the team shared access to its OPT LLM with the scientific and academic research community. In addition, the team released both the pre-trained models and the code required to train and use them, which is a first for a language technology system of this magnitude. “We are sharing Open Pretrained Transformer (OPT-175B), a language model with 175 billion parameters trained on publicly available data sets, to allow for more community engagement in understanding this foundational new technology,” the company said in a blog post.

What purposes will open access solve?

Over the last few years, large language models — natural language processing (NLP) systems have advanced research in NLP and AI domains. They demonstrate a startling new ability to write creative content, answer simple math problems, answer reading comprehension tests, and more after being trained on a large and varied text volume. However, the limited accessibility to these LLMs beats the real purpose - reproducible and responsible research at scale. With this move, multiple purposes get solved, including:

  • Help understand what’s cooking behind: It’s important more than ever before to understand the ‘how’ and ‘what’ behind the working of these large language models. Accessibility ensures work towards improving the robustness of these systems and mitigating existing challenges, including toxicity, bias and more.

Take, for instance, one of the major points of concern is uneven internet penetration. Although the amount of data available on the internet is vast, it does not guarantee diversity, as per the paper titled ‘“On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?” Moreover, the researchers, including the renowned computer scientist Timnit Gebru pointed out the fact that internet penetration in Africa is only 39.3 per cent, whereas it is over 90% in Europe.

  • An approach towards Responsible AI: Given their prominence in many downstream language applications, the whole AI community, including academic researchers, governments, civil society, and industry, must work together to set clear principles surrounding responsible AI in general and responsible big language models in particular. The access to large language models to a larger part of the AI community will help to work toward responsible language models, enhance transparency and collectively advance the field as a whole.
  • May help address concerns around privacy: Even if trained on public data, large datasets in the hundreds of gigabytes from multiple sources contain sensitive and personally identifiable information (PII), such as names, contact numbers, addresses, gender, and so on. A joint study - Apple, OpenAI, Berkeley, Stanford and Northeastern University - demonstrated that the ability to query a pre-trained language model it is possible to extract details of training data.

Additionally, as the amount of computing power required during the experiment and training phases has increased enormously, there is a growing concern among the AI community around the carbon footprint of these language models. “We developed OPT-175B with energy efficiency in mind by successfully training a model of this size using only 1/7th the carbon footprint of GPT-3. This was achieved by combining Meta’s open sourceFully Sharded Data-Parallel (FSDP) API and NVIDIA’s tensor parallel abstraction within Megatron-LM," said the company.

Conclusion

In order for AI research to progress, the scientific community must be able to collaborate with cutting-edge models to effectively explore their promise while also exploring for any shortcomings. Unfortunately, while the potential is huge in the field of big language models, the constraints and risks that these models pose are still unknown. 

Researchers are constrained in their ability to devise detection and mitigation measures for potential harm without direct access to these models, leaving detection and mitigation in the hands of only those with sufficient wealth to access models of this scale.

Also, read other stories on language models:

  1. Google's REALM: Integrating Retrieval into Language Representation Models
  2. Open AI & Stanford researchers urge drawbacks of language models to be addressed

Want to publish your content?

Publish an article and share your insights to the world.

ALSO EXPLORE

DISCLAIMER

The information provided on this page has been procured through secondary sources. In case you would like to suggest any update, please write to us at support.ai@mail.nasscom.in