Researchers have developed a machine-learning method that lets AI chatbots and intelligent keyboards learn from new smartphone user data.

Personalized deep-learning models can help create intelligent keyboards that dynamically update to predict the next word by examining a user's typing history or chatbots that learn a user's dialect. This customization necessitates that a machine-learning model be continuously refined using new data.

Cloud servers

User data is commonly transmitted to cloud servers for model updates, as smartphones and other peripheral devices lack the requisite memory and computational capabilities for fine-tuning. However, transmitting confidential user information to a cloud server consumes substantial energy and presents a security risk. Researchers devised a methodology to facilitate the practical adaptation of deep-learning models to novel sensor data occurring directly on an edge device.

PockEngine

PockEngine, their method of on-device training, identifies which portions of a massive machine-learning model require updating to increase precision and stores and computes using only those portions. Most of these computations are executed during model preparation before execution. This approach reduces the computational burden and accelerates the process of fine-tuning.    

PockEngine demonstrated a substantial acceleration in on-device training, surpassing alternative approaches by as much as 15 times on select hardware platforms. Furthermore, PockEngine did not result in any decrease in the accuracy of the models. Additionally, the researchers discovered that their method of fine-tuning improved the accuracy with which a well-known AI chatbot responded to sophisticated queries.

Data processing

Deep-learning models are built on neural networks, which comprise many layers of nodes, or "neurons," that work together to process data and make predictions. Inference is the process of running the model. A piece of data, like an image, is sent from one layer to the next until the prediction, which could be the picture label, is shown at the end. After processing the data, each layer doesn't need to be stored during inference.

Backpropagation, on the other hand, is what the model does while it is being trained and fine-tuned. The result is checked against the correct answer at backpropagation, and then the model is run backwards. Each layer changes as the model's result gets closer to the right answer. 

Fine-tuning

Fine-tuning needs more memory than inference because each layer may need to be changed, so the whole model and intermediate results must be kept. On the other hand, only some layers in the neural network are necessary to make it more accurate. Some layers may be kept the same, even if essential. It's not required to store those layers and pieces of layers. Also, to get more accurate, you might only have to go some of the way back to the first layer. The process could be stopped in the middle.

These things help PockEngine speed up fine-tuning and eliminate the need for as much memory and processing. First, the system fine-tunes each layer on a specific job one at a time. After each layer, it checks how much the accuracy has improved. This way, PockEngine figures out what each layer contributes and the costs and benefits of fine-tuning. It also figures out instantly what percentage of each layer needs to be fine-tuned.

A pared-down model

Traditionally, the generation of the backpropagation graph occurs at runtime, necessitating extensive computation. PockEngine performs this operation during the compilation phase when the model is optimized for implementation. PockEngine performs code deletions to eliminate extra layers or sublayers, generating a condensed model graph that can be executed at runtime. It subsequently applies additional optimizations to this graph to increase its efficacy.

Conclusion

The fact that this only requires it to be executed once results in reduced computational latency during runtime. When PockEngine was utilized to train deep-learning models on various edge devices, such as Apple M1 Chips, digital signal processors found in numerous smartphones, and Raspberry Pi computers, the process was executed 15 times more quickly while maintaining the same level of accuracy. In addition, PockEngine substantially reduced the memory usage necessary for fine-tuning operations. Furthermore, the researchers intend to use PockEngine to fine-tune even more complex models designed to process images and text simultaneously.

Sources of Article

Image source: Unsplash

Want to publish your content?

Publish an article and share your insights to the world.

Get Published Icon
ALSO EXPLORE