Megatron-LM
NVIDIA's framework for training multi-billion parameter transformer models
View on GitHubOverview
Megatron-LM is NVIDIA’s research framework for training large transformer language models using tensor, pipeline, sequence, and expert parallelism.
Parallelism Strategies
- Tensor Parallelism (TP): Splits layers across GPUs
- Pipeline Parallelism (PP): Splits model layers across stages
- Sequence Parallelism (SP): Distributes sequence computation
- Expert Parallelism (EP): For Mixture-of-Experts models
Supported Models
- GPT (decoder-only)
- BERT (encoder-only)
- T5 (encoder-decoder)
- Mixture-of-Experts (MoE)