Results for ""
OPT-IML is a language model with 175B parameters that has been fine-tuned on 2,000 language tasks; it will soon be freely accessible under a noncommercial licence for research applications.
For example, Generative Pre-trained Transformer 3 (GPT-3) is a language model that uses deep learning to generate text that appears to be written by humans (output). In addition to producing text, it may generate code, stories, poetry, etc.
The researchers here for OPT-IML used instruction tuning to improve the performance of our prior OPT-175B work for OPT-IML. In addition, it enabled us to modify it for diverse applications, such as question-answering, text summarization, and translation.
Recent research has demonstrated that instruction-tuning, also known as fine-tuning, increases the zero and few-shot generalisation of large pre-trained language models to unknown tasks. However, the performance trade-offs of various choices made during the instruction-tuning process are only partially understood. The size and variety of the instruction-tuning benchmark, different task sampling techniques, fine-tuning with and without demonstrations, training with specialised datasets for reasoning and discussion, and finally, the fine-tuning objectives are some of the choices made.
In this study, the researchers describe how increasing both model and benchmark sizes affects the effect of instruction-tuning decisions on downstream task performance. To achieve this, they developed the OPT-IML Bench, a sizable benchmark for Instruction MetaLearning (IML) comprising 2000 NLP tasks grouped into task categories from eight existing benchmarks. They also create an evaluation framework to gauge three model generalisations: tasks from fully held-out categories, held-out tasks from seen types, and held-out instances from seen tasks.
To train OPT-IML 30B and 175B, which are instruction-tuned versions of OPT, the researchers first give insights on instruction-tuning decisions as applied to OPT-30B through the perspective of this framework. On four assessment benchmarks—PromptSource, FLAN, Super-NaturalInstructions, and UnifiedSKG—with various objectives and input formats, OPT-IML exhibits all three generalisation skills at both scales. It not only dramatically outperforms OPT on all benchmarks but also outperforms existing models optimised for that particular benchmark in a highly competitive manner. Together with the OPT-IML Bench evaluation system, the researchers publish OPT-IML at both scales.
Related work
The scope of their work on fine-tuning big language models to follow instructions includes multi-task learning, prompting, and meta-training of in-context learning. Finally, the researchers outline the content of our work that most closely relates to the topics below.
Instruction Tuning
Recent efforts have begun fine-tuning these models on diverse instructions to align them with adhering to instructions and automatically eliminate the need for engineering. One focuses on fine-tuning the model on various tasks using human-annotated instructions and feedback. In contrast, the other focuses on adding instructions via annotations or automatically to publicly accessible benchmarks and datasets.
In their research, the researchers concentrate on the second technique and compile many publicly accessible datasets with guidelines for refining OPT. Concurrently with their study, Chung et al. (2022a) present a similar scaling method for 1836 jobs from four benchmarks. Finally, while focusing on tuning the entire test to push the limits of performance on challenging external benchmarks such as MMLU and Big-Bench Hard (BBH), researchers characterise the tradeoffs of various instruction-tuning strategies that can impact downstream performance.
Multi-task Learning
Multitask Learning is a formulation of instruction-based fine-tuning (MTL). MTL is a prevalent paradigm that improves a task's generalisation performance when coupled with similar functions that share comparable parameters or representations. In recent years, MTL has been applied to numerous NLP scenarios with a primary focus on enhancing the performance of training tasks or new domains by utilising the signal from related activities. In contrast, instruction-based fine-tuning helps us to increase generalisation performance on never-before-seen problems. It is accomplished by combining all tasks into a single concept via instructions and training them together by distributing the model's weights across all tasks.
Conclusion
Instructional tweaking of LLMs has emerged as an efficient method for enhancing their zero-shot and few-shot generalisation capabilities. In this study, the researchers make three significant additions to instruction tuning. First, they compile a large-scale benchmark for instruction tuning consisting of 2,000 NLP jobs from eight dataset collections, categorised by task type.
The researchers build assessment splits selectively on this benchmark to test three distinct sorts of model generalisation capabilities:
1) a supervisor oversees performance,
2) performance on unseen tasks from categories of seen tasks, and
3) performance on assignments from excluded categories entirely
Furthermore, the researchers train and release OPT-IML 30B and 175B instruction-tuned models that significantly outperform OPT on five evaluation benchmarks and are competitive with current instruction-tuned models tuned on individual standards.
For more details:
GitHub: facebookresearch/metaseq
Article: OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization