Results for ""
It looks like natural language processing (NLP) can’t get any bigger than OpenAI’s latest offering - the GPT-3. The California-based Artificial Intelligence (AI) research company OpenAI, released a pre-print of the mighty GPT-3 NLP with close to 175 billion trainable parameters. A parameter - greater or lesser - gives the colour and shade to otherwise dry data and gives the neural network the ability to have a learned perspective on the data.
GPT-3’s parameter capability is the biggest out there in the market, better than Microsoft’s recently released ZeRO-2, which has 170 billion parameters.
GPT-3 is a scaled version of its predecessor GPT-2 which was released by OpenAI last year. The GPT-2 is adaptable of Transformer, a Google invention that predicts words by calculating the probability based on surrounding words.
Additionally, GPT-3 also moves away from the traditional approach of the pre-trained language model, which trains a model on specific datasets for task-specific model architectures. Collating specific datasets is always a task, often limiting the NLP’s true potential. “Humans do not require large supervised datasets to learn most language tasks – a brief directive in natural language or at most a tiny number of demonstrations is often sufficient to enable a human to perform a new task to at least a reasonable degree of competence…this adaptability has practical advantages – it allows humans to seamlessly mix together or switch between many tasks and skills, for example performing addition during a lengthy dialogue. To be broadly useful, we would someday like our NLP systems to have this same fluidity and generality,” state the researchers in their pre-print paper titled “Language Models are Few-Shot Learners”, released on arVix on Thursday.
They’ve been able to inch towards accomplishing this with GPT-3 by achieving “meta-learning.” The researchers trained GPT-3 on Common Crawl data, a collection of trillion words of texts that have been scraped off the Web. The NLP model learns a broad set of skills and ability to recognise patterns during training time, unlike learning specific tasks. It then uses these abilities to adapt to or recognise the desired task during interference time. That means that GPT-3 learns how to do a task in a single go, and need not be re-trained for similar tasks. GPT-3 is able to perform impressive tasks such as completing sentences to understanding the logical entailment of statement to translating languages, and it in some cases it performs these tasks better than versions of Transformer that have specifically trained to perform only that task. “Broadly, on NLP tasks GPT-3 achieves promising results in the zero-shot and one-shot settings, and in the few-shot setting [it] is sometimes competitive with or even occasionally surpasses state-of-the-art (despite state-of-the-art being held by fine-tuned models),” note the authors.
GPT-3’s adaptation capabilities to newer tasks, which were not directly present in the training set, were evaluated on more than two dozen NLP datasets with a few interesting experiments. These experiments were carried under three wide categories - few-shot learning, one-shot learning, and zero-shot learning. GPT-3 performed very well on datasets related to translation, Q&A, predictive texts. It also did well on performing 3-digit arithmetic and using novel words in a sentence. “There is no need to do gradient/parameter updates (fine-tuning) for using the GPT-3 model for various tasks. One can interact with the model using natural language and/or provide some examples of the tasks that you are trying to do, and the model will do it,” observe the researchers. Most impressively, GPT-3 also generated news articles that human evaluators found it hard to distinguish from human-written articles!
However, the researchers are humble enough to list out GPT-3’s short-coming after introducing us to such impressive feats! “GPT-3 appears to be weak in the few-shot or one-shot setting at some tasks that involve comparing two sentences or snippets, for example, whether a word is used the same way in two sentences (WiC), whether one sentence is a paraphrase of another, or whether one sentence implies another,” the researchers admit. The researchers also did a sentiment analysis to understand the model’s racial bias performance. They found that the work “Asia” had an overall positive score, while the word “black” consistently had low sentiment analysis score.
In understanding its’ association with gender and occupation, they observed that GPT-3 attributed most occupations out of almost 400 occupations to male identifiers. Religion too, was tested and came up with results that reflect societal biases. Surprisingly, researchers found that GPT-2 was more idealistic!
Furthermore, the paper also addresses data contamination; energy consumption during training; the large-scale impact and potential misuses of the advanced language model such as “misinformation, spam, phishing, abuse of legal and governmental processes, fraudulent academic essay writing, and social engineering pretexting.”