Get featured on IndiaAI

Contribute your expertise or opinions and become part of the ecosystem!

Microsoft this week announced Tutel, a high-performance MoE library to facilitate the development of large-scale DNN models. Mixture of experts (MoE) is a deep learning model architecture in which computational cost is sublinear to the number of parameters, making scaling easier.

Nowadays, MoE is the only approach demonstrated to scale deep learning models to trillion-plus parameters, paving the way for models capable of learning even more information and powering computer vision, speech recognition, natural language processing, and machine translation systems, among others, that can help people and organizations in new ways.

Tutel is highly optimized for the new Azure NDm A100 v4 series, now generally available. With Tutel’s diverse and flexible MoE algorithmic support, developers across AI domains can execute MoE more easily and efficiently. For a single MoE layer, Tutel achieves an 8.49x speedup on an NDm A100 v4 node with 8 GPUs and a 2.75x speedup on 64 NDm A100 v4 nodes with 512 A100 GPUs (all experiments in this blog are tested on Azure NDm A100 v4 nodes with 8 x 80 GB NVIDIA A100 and an 8 x 200 gigabits per second InfiniBand network), respectively, compared with state-of-the-art MoE implementations such as that in Meta’s Facebook AI Research Sequence-to-Sequence Toolkit (fairseq) in PyTorch.

For end-to-end performance, Tutel—benefiting from an optimization for all-to-all communication—achieves a more than 40 percent speedup with 64 NDm A100 v4 nodes for Meta’s (Facebook is now Meta) 1.1 trillion–parameter MoE language model. Tutel provides great compatibility with rich features to ensure the great performance when working on the Azure NDm A100 v4 cluster. Tutel is open source and has been integrated into fairseq.

Tutel provides diverse and flexible support for state-of-the-art MoE algorithms, including support for:

  • the arbitrary K setting for the Top-K gating algorithm (most implementations only support Top-1 and Top-2).
  • different exploration strategies, including batch-prioritized routing, input dropout, and input jitter
  • different levels of precisions, including half precision (FP16), full precision (FP32), and mixed precision (we’ll support BF16 in our next release)
  • different types of devices, including both NVIDIA CUDA and AMD ROCm devices

Tutel will be actively integrating various emerging MoE algorithms from the open-source community.

Want to publish your content?

Get Published Icon
ALSO EXPLORE