Megatron-LM

Tensor Parallelism (TP): Splits layers across GPUs
Pipeline Parallelism (PP): Splits model layers across stages
Sequence Parallelism (SP): Distributes sequence computation
Expert Parallelism (EP): For Mixture-of-Experts models

NVIDIA's framework for training multi-billion parameter transformer models

Overview

Megatron-LM is NVIDIA’s research framework for training large transformer language models using tensor, pipeline, sequence, and expert parallelism.