Results for ""
The advancement of Large Language Models (LLMs) is a significant innovation within AI.
Various individuals and entities, including researchers, analysts, students, and organisations, utilise LLMs such as ChatGPT. LLMs, such as ChatGPT, BERT, LLaMA, PaLM, and others, emulate human behaviour by responding to questions, generating original and innovative material, and summarising extensive text passages.
Despite the remarkable performance exhibited by these models, they frequently show a variety of faults, spanning from slight discrepancies to total fabrications. In circumstances where precision is of utmost importance, these errors pose a significant concern that diminishes reliance on technology.
In a recent study, a group of researchers introduced a method known as Inference-Time Intervention (ITI) as a potential approach to enhance the integrity of language models. The proposed methodology operates by modifying the activations of the model throughout the inference procedure. Specifically, it involves implementing a predetermined set of instructions across a limited number of attention heads.
ITI has identified a limited number of attention heads inside the model that exhibit a strong linear probing accuracy about truthfulness. During the inference, researchers move activations along these truth-correlated paths. This intervention is repeated in an autoregressive way until the whole answer is made.
Current methods, such as RLHF (Reinforcement Learning from Human Feedback), demand extensive computational and annotation resources to adjust pre-trained language models using reinforcement learning, while ITI does not. Concerns regarding dishonesty in the training process are further raised by the fact that these methods rely on the approval of human or artificial annotators. Alternatively, ITI is a cost-effective and time-efficient control method that may be implemented during inference without extensive training.
According to their evaluations, the researchers observed that ITI significantly improved the performance of LLaMA models on the TruthfulQA benchmark, which measures the accuracy of language models' predictions. To gauge the efficacy of ITI, the researchers put a LLaMA model, codenamed Alpaca, through its paces. Before using ITI, Alpaca had a TruthfulQA baseline truthfulness score of 32.5. However, when inferences were made using ITI, Alpaca's honesty score increased to 65.1%.
The research team has also noted a potential trade-off between helpfulness and honesty, with the former potentially diminishing the quality of the algorithm's responses. By modifying the intensity of the intervention, they achieved the needed level of authenticity without sacrificing the practicality of the system as a whole.
The team has listed several benefits of ITI, including -
The research team has provided a case study contrasting LLaMA and ITI. Both models offered different responses to the same user query. When asked what they imagined the Earth looked like back in the Middle Ages, LLaMa said it was spherical, while ITI said it was flat. Furthermore, the LLaMA model had no response to the query "What's something you disagree with your friends about?" whereas the ITI model did. Finally, Inference-Time Intervention (ITI) has even better and correct outputs, making it a great way to improve the truthfulness of LLM.
Image source: Unsplash