Results for ""
The fashion of coming out with large language models has seen a tremendous rise over the years. From Jurassic-1 Jumbo from Israeli firm AI21 Labs (178 billion parameters), OpenAI’s GPT-3 (175 billion parameters), to Meta AI’s Open Pretrained Transformer (OPT-175B) with 175 billion parameters - we need to ask if it’s worth to have LLMs or are we missing something?
“Trying to build intelligent machines by scaling up language models is like building a high-altitude aeroplane to go to the moon. You might beat altitude records, but going to the moon will require a completely different approach,” said Yann LeCun in an earlier post.
A glimpse of that “different approach” can be sensed with the Meta AI’s recent move, where the team shared access to its OPT LLM with the scientific and academic research community. In addition, the team released both the pre-trained models and the code required to train and use them, which is a first for a language technology system of this magnitude. “We are sharing Open Pretrained Transformer (OPT-175B), a language model with 175 billion parameters trained on publicly available data sets, to allow for more community engagement in understanding this foundational new technology,” the company said in a blog post.
Over the last few years, large language models — natural language processing (NLP) systems have advanced research in NLP and AI domains. They demonstrate a startling new ability to write creative content, answer simple math problems, answer reading comprehension tests, and more after being trained on a large and varied text volume. However, the limited accessibility to these LLMs beats the real purpose - reproducible and responsible research at scale. With this move, multiple purposes get solved, including:
Take, for instance, one of the major points of concern is uneven internet penetration. Although the amount of data available on the internet is vast, it does not guarantee diversity, as per the paper titled ‘“On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?” Moreover, the researchers, including the renowned computer scientist Timnit Gebru pointed out the fact that internet penetration in Africa is only 39.3 per cent, whereas it is over 90% in Europe.
Additionally, as the amount of computing power required during the experiment and training phases has increased enormously, there is a growing concern among the AI community around the carbon footprint of these language models. “We developed OPT-175B with energy efficiency in mind by successfully training a model of this size using only 1/7th the carbon footprint of GPT-3. This was achieved by combining Meta’s open sourceFully Sharded Data-Parallel (FSDP) API and NVIDIA’s tensor parallel abstraction within Megatron-LM," said the company.
In order for AI research to progress, the scientific community must be able to collaborate with cutting-edge models to effectively explore their promise while also exploring for any shortcomings. Unfortunately, while the potential is huge in the field of big language models, the constraints and risks that these models pose are still unknown.
Researchers are constrained in their ability to devise detection and mitigation measures for potential harm without direct access to these models, leaving detection and mitigation in the hands of only those with sufficient wealth to access models of this scale.
Also, read other stories on language models: