LiGO is a novel artificial intelligence technique developed by researchers to accelerate the training of huge machine-learning models.

The researchers' framework uses neuron weights as building blocks to train a larger neural network model faster. In addition, their machine learning approach learns to enhance the model's breadth and depth in a data-driven manner.

Overview

Researchers have found that when these models are big enough, they can do things no one else can. But it takes more time and money to train bigger models. It is because hundreds of billions of examples are shown to a model as part of the training process. Getting so much information is a complicated process in and of itself. Training a billion-parameter model takes time and money.

Objective

Machine learning encodes the knowledge of a smaller model to "grow" a larger model. As a result, it expedites the training process for the larger model. For example, their method reduces the computational cost of training a large model by 50% compared to methods that start from scratch. The MIT method also outperformed other strategies that use smaller models to train larger models faster.

Lowering the time required to train enormous models could help researchers achieve improvements more quickly and at a lower cost while simultaneously reducing the carbon emissions produced during training. It could also let smaller research groups use these massive models, enabling many innovations.

Experiment

Large language models like GPT-3, which powers ChatGPT, are created using transformer neural networks. Transformer designs are distinctive because, as these types of neural network models grow in size, their performance improves dramatically. Frequently, these models have hundreds of millions or billions of observable parameters. 

Training all of these characteristics from scratch is costly. Therefore researchers attempt to speed up the process. Model growth is a technique that has proven to be beneficial. Using the model growth technique, researchers can expand the size of a transformer by duplicating neurons or even entire layers from a previous version of the network and stacking them on top. They can develop a network by adding new neurons to an existing layer, or they can make it deeper by adding more layers of neurons.

Conclusion

Their technology, which they refer to as a learnt Linear Growth Operator (LiGO), uses data-driven learning to extend the width and depth of a larger network based on the characteristics of a smaller network. The smaller model may be significant, potentially containing a hundred million parameters, and researchers may wish to create a model with one billion parameters. Hence, the LiGO method divides the linear map into smaller bits that an algorithm for machine learning can manage.

Moreover, LiGO expands concurrently in width and depth, making it more efficient than other approaches. Kim indicates that a user can adjust the width and depth of the larger model by inputting the smaller model and its parameters. Their method was faster than model growth and new-modelled training. This strategy reduces the computing expenses required to train vision and language models by approximately 50 per cent while frequently enhancing performance. Furthermore, the researchers also discovered that LiGO might be used to expedite transformer training even without a smaller, pre-trained model.

Sources of Article

Image source: Unsplash

Want to publish your content?

Publish an article and share your insights to the world.

Get Published Icon
ALSO EXPLORE