Megatron-LM

NVIDIA's framework for training multi-billion parameter transformer models

View on GitHub

Overview

Megatron-LM is NVIDIA’s research framework for training large transformer language models using tensor, pipeline, sequence, and expert parallelism.

Parallelism Strategies

  • Tensor Parallelism (TP): Splits layers across GPUs
  • Pipeline Parallelism (PP): Splits model layers across stages
  • Sequence Parallelism (SP): Distributes sequence computation
  • Expert Parallelism (EP): For Mixture-of-Experts models

Supported Models

  • GPT (decoder-only)
  • BERT (encoder-only)
  • T5 (encoder-decoder)
  • Mixture-of-Experts (MoE)