Results for ""
The OpenAI's GPT-3 was the largest language model to be released last year. Since 2017, the deep-learning models have gone through an extensive makeover, especially in the field of natural language processing (NLP). Some of the Transformer-family NLP models are BERT, ALBERT, and the GPT series.
While, developers across the world were eagerly waiting for the GPT-3 model from OpenAI they were meted with disappointment because OpenAI, which until previously had opened sourced GPT-3's predecessors, sold the source code of GPT-3 to Microsoft. This development had left a gaping hole in many developers dreams and expectations. However, they have seen a saviour in the arms of EleutherAI, the developers of GPT-Neo, a GPT-3-like NLP which is open-source!
Stella Biderman, Leo Gao, Sid Black, and others formed EleutherAI with the idea of making AI technology that would be open source to the world. One of the first problems the team chose to tackle was making a GPT-like language model that would be accessible to all.
While the source code for an NLP model was already available, OpenAI, a billion-dollar company, trained the GPT models on the biggest datasets in the world. According to Venture Beat," a private corpus of 500 billion tokens was used for training the model, and approximately $50 million was spent in computing costs. To replicate the GPT models, one would incur high costs of computing and need massive amounts of data.
The GPT-Neo is created using the Mesh Tensorflow for distributed support. While it works on TPUs, it can work on GPUs too. However, the EleutherAI team realised that even with TPUs provided through TFRC wouldn't suffice in training GPT-Neo.
That's when EleutherAI set out to create an open-source data set of a scale comparable to that of OpenAI's model used for its GPT language models and created the Pile, an 825GB data set specifically designed to train language models. This move was supported by CoreWeave, a US-based cryptocurrency miner; CoreWeave offered the EleutherAI team access to its hardware in exchange for an open-source GPT-3-like model.
The Pile is a dataset made from 22 diverse sources, including academic sources (Arxiv, PubMed, FreeLaw etc.), Internet webpages (StackExchange, Wikipedia etc.), dialogues from subtitles, Github, etc.
On March 22, 2021, after months of painstaking research and training, the EleutherAI team released two trained GPT-style language models, GPT-Neo 1.3B and GPT-Neo 2.7B. The code and the trained models are open-sourced under the MIT license. And the models can be used for free using HuggingFace’s Transformers platform.
When compared GPT-Neo and GPT-3, Venture Beat made interesting observations. GPT-Neo did better than GPT-3 Ada on Hellaswag and Piqa. Hellaswag is a benchmark game that has intelligent multi-choice sentence completion that has a context paragraph and four endings. Piqa can measure common sense reasoning where the machine has to pick one out of two sentences that make the most sense. GPT-Neo also outperformed GPT-3 Ada on Winogrande, a benchmark that uses common sense to resolve ambiguous pronouns in a sentence. However, GPT-Ada is not the largest GPT-3 version; it is GPT-3 Davinci. With about 65 times as many parameters as compared to GPT-Neo, Davinci comfortably beat GPT-Neo in all these benchmarks.
The bottom line here is: GPT-Neo is a great open-source alternative to GPT-3, especially given OpenAI’s closed access policy.