DeepSpeed

DeepSpeed is an open source deep learning optimization library for PyTorch.

[1] The library is designed to reduce computing power and memory use and to train large distributed models with better parallelism on existing computer hardware.

[2][3] DeepSpeed is optimized for low latency, high throughput training.

It includes the Zero Redundancy Optimizer (ZeRO) for training models with 1 trillion or more parameters.

[5] The team claimed to achieve up to a 6.2x throughput improvement, 2.8x faster convergence, and 4.6x less communication.