Get featured on IndiaAI

Contribute your expertise or opinions and become part of the ecosystem!

Natural language processing has made considerable progress in recent times based on unsupervised pre-training, that uses general-purpose language representation models by employing a large amount of text without human interference, annotation or labelling. Models such as BERT and RoBERTa have exhibited the capacity to memorise large amounts of data. While this is an important aspect of building effective natural language processing models that can execute tasks such as question answering, information retrieval and text generation, the data and knowledge stored within these models are stored in abstract manners in model weights. 

This makes it difficult to determine the exact location of where knowledge is stored. Limitations such as storage space and size of the network also affect the accuracy of the model while storing and retrieving this information. This would mean that a model with more knowledge would mean larger networks, which can be either slow or expensive. 

Researchers at Google worked towards creating a method to pre-train a knowledge retriever in an unsupervised manner, that references additional large external text corpus like Wikipedia using masked language modelling as the learning signal. They, then, used a retrieval step by backpropagating to consider millions of documents. 

They called this model Retrieval-Augmented Language Model pre-training (REALM) and demonstrated its effectiveness by publishing a study in the pre-publishing platform arxiv.org. Along with the research paper, the team has also open-sourced the REALM codebase to show how other people interested in the field can train the retriever and the language representation jointly. 

Standard language representation models such as BERT, uses masked language modelling for tasks such as fill-in-the-black tasks. The models learn to go over a large number of examples and repeatedly adjust parameters to try and figure out the words in the blank; which ironically makes the models remember a few world facts. But since the memorisation is in the abstract, it becomes difficult to understand where the information is stored!

REALM uses a knowledge retriever to augment the language representation model. The knowledge retriever first recovers another piece of text from an external document collection as the supporting knowledge and then feeds this supporting content along with the original content into a language representation model. "The key intuition of REALM is that a retrieval system should improve the model's ability to fill in missing words. Therefore, a retrieval that provides more context for filling the missing words should be rewarded. If the retrieved information does not help the model make its predictions, it should be discouraged, making room for better retrievals," state the creators of REALM, on Google blogs. 

However, the researchers also point out the challenges of scaling a model such as REALM. REALM selected the best document out of millions of documents by using the maximum inner product search (MIPS). "MIPS models need to first encode all of the documents in the collection, such that each document has a corresponding document vector. When input arrives, it is encoded as a query vector," explain the researchers. The model uses ScaNN package for MIPS to make the search for MIP value relatively cheap. However, if the model parameters were updated during training, it is typically necessary to re-encode the document vectors for the entire collection of documents. "To address the computational challenges, we structure the retriever so that the computation performed for each document can be cached and asynchronously updated. We also found that updating document vectors every 500 training steps, instead of every step, is able to achieve good performance and make training tractable," clarify the writers. 

When tested for the effectiveness of REALM in open-domain question answering - once with answers in a given document and once without the document, the results were clear. "One can clearly see that REALM pre-training generates very powerful Open-QA models, and even outperforms the much larger T5 (11B) model by almost 4 points, using only a fraction of the parameters (300M)," state the researchers. 

Want to publish your content?

Publish an article and share your insights to the world.

Get Published Icon
ALSO EXPLORE