Recently, researchers conducted a study to understand better the processes involved in retrieving stored knowledge by large-scale machine-learning models.

In their research, scientists showcase a method for investigating a model's understanding of unfamiliar topics. Despite their widespread usage in several domains, such as customer care, code production, and language translation, scientists still need a complete understanding of their functioning.

Linear function

Large language models (LLMs) frequently employ a straightforward linear function to retrieve and interpret stored information. Furthermore, the model uses the identical decoding function for comparable categories of information. Linear functions are mathematical expressions that represent a direct relationship between two variables without including exponents and with only two variables.

The researchers demonstrated that by discerning linear functions for various pieces of information, they may investigate the model to determine its understanding of unfamiliar topics and the specific location inside the model where that knowledge is kept. Utilizing a method they devised to approximate these uncomplicated functions, the scientists discovered that despite a model providing an inaccurate response to a prompt, it frequently retains accurate information. 

Finding facts

Most large language models, called transformer models, are neural networks. Neural networks, loosely based on the human brain, comprise billions of interconnected nodes, or neurons, organized into multiple layers and used to encode and process data. As a transformer develops expertise, it stores information on a specific subject over numerous levels. If a user inquires about the subject, the model must decode the most relevant fact to respond to the query.

Investigating LLM

The researchers conducted several tests to investigate LLMs and discovered that, despite their high level of complexity, these models extract relational information by employing a straightforward linear function. Each function is tailored to the particular sort of fact being obtained.

As an illustration, the transformer would employ a single decoding function whenever it needs to generate the instrument a person plays and a distinct function each time it needs to create the state where a person was born. The researchers devised a technique to approximate these uncomplicated functions and calculated functions for 47 distinct relationships, such as "capital city of a country" and "lead singer of a band." The researchers selected this specific subset of relations for examination because they are illustrative of the facts that can be expressed in this manner, but there are many more potential relations.

Evaluation

The researchers tested each function by altering the topic to determine its ability to retrieve accurate object information. For example, if the subject is Norway, the function of the "capital city of a country" should return to Oslo, and if the subject is England, it should return to London.

Functions successfully recovered accurate information in over 60 per cent of cases, indicating that some data within a transformer is encoded and retrieved using this method. 

Visualizing a model’s knowledge

In addition, the functions were employed to ascertain the model's beliefs regarding various themes.

During a particular experiment, the researchers initiated with the prompt "Bill Bradley was a" and employed the decoding functions for "plays sports" and "attended university" to assess the model's understanding of the fact that Senator Bradley was a basketball player who went to Princeton.

The probing technique generated an "attribute lens," a grid visually representing the storage location for specific information related to a particular relationship inside several transformer levels.

Conclusion

Automatically generated attribute lenses offer a streamlined approach to helping researchers understand a model better. This visualization tool can empower scientists and engineers to rectify stored knowledge and mitigate the risk of an AI chatbot disseminating inaccurate information.

Researchers aim to understand better scenarios when data are not preserved linearly in the future. In addition, they are interested in studies using larger models and examining the accuracy of linear decoding functions. In the future, scientists may employ this methodology to identify and rectify inaccuracies within the model, mitigating the model's inclination to provide erroneous or illogical responses occasionally.

Sources of Article

Image source: Unsplash

Want to publish your content?

Publish an article and share your insights to the world.

Get Published Icon
ALSO EXPLORE