DeepSpeed-MII is a new open-source python library from DeepSpeed that accelerates over 20,000 widely used deep learning models.

Although open-source software has made AI more accessible, there are still two critical obstacles to its general use: inference time and cost.

System improvements have potentially minimized DL model inference latency and cost, but they have yet to be widely available. As a result, numerous data scientists need the knowledge to accurately identify and execute the set of system optimizations pertinent to a particular model, rendering low latency and low-cost inference primarily unattainable. This lack of availability is due mainly to the complexity of the DL model inference landscape, which includes significant variations in model size, architecture, system performance characteristics, and hardware requirements.

DeepSpeed-MII

DeepSpeed-MII is a new open-source python library developed by Microsoft Research to promote the broader adoption of low-latency, cost-effective inference of high-performance models. MII offers access to thousands of widely used DL models with highly efficient implementations.

For low latency/cost inference, MII uses DeepSpeed-Inference optimizations like deep fusion for transformers, automated tensor-slicing for multi-GPU inference, and ZeroQuant quantization. As a result, it enables quick, easy, and low-cost deployment of these models via AML on-premises and Azure.

Under the hood, DeepSpeed-Inference is the engine that drives MII. Based on the model type, batch size, and hardware resources, MII optimizes DeepSpeed-system Inference to reduce latency and increase throughput. MII and DeepSpeed-Inference do this by employing several pre-specified model injection rules, which enables the determination of the underlying PyTorch model architecture and its replacement with an optimized implementation. As a result, the tens of thousands of widely-used models provided by MII have immediate access to DeepSpeed-comprehensive Inference's set of optimizations.

Open-source repositories

Multiple open-source model repositories, such as Hugging Face, FairSeq, EluetherAI, etc., provide access to thousands of transformer models. MII offers a variety of applications, including text production, question answering, and classification, among others. It supports BERT, RoBERTa, GPT, OPT, and BLOOM models with hundreds of millions of parameters. In addition, current image production techniques, such as Stable Diffusion, are enabled. Furthermore, inference workloads can be either latency-critical or cost-sensitive, where the primary objective is minimizing delay or cost.

MII can utilize one of two DeepSpeed-Inference variations. The first, ds-public, is part of the public DeepSpeed library and contains most of the enhancements above. In addition, Microsoft Azure users can access ds-azure via MII for a stronger connection. MII instances can be accessed using both MII-Public and MII-Azure DeepSpeed-Inference variants.

Conclusion

MII-Public and MII-Azure offer significant latency and cost savings compared to the open-source PyTorch implementation (Baseline). However, under specific generating workloads, their performances may vary. MII reduces latency by up to 6x for open-source models and workloads, making it ideal for latency-critical applications with batch sizes of one. The team maximized baseline and MII throughput with a large batch size to get the lowest cost. Using MII, costly language models like Bloom, OPT, etc., can significantly reduce inference costs.

MII-Public is capable of running both locally and on any cloud service. MII implements a simple GRPC server and provides a GRPC inference endpoint for deployment-related inquiries. AML Inference can be utilized with Azure to support MII. Furthermore, the researchers expect that their findings will facilitate a variety of models. MII rapidly reduces inferencing latency and cost, enabling more advanced AI capabilities in applications and products.

Want to publish your content?

Publish an article and share your insights to the world.

ALSO EXPLORE

DISCLAIMER

The information provided on this page has been procured through secondary sources. In case you would like to suggest any update, please write to us at support.ai@mail.nasscom.in