TorchTitan

PyTorch native distributed training framework for production LLM pre-training

View on GitHub

Overview

TorchTitan is a PyTorch-native framework for pre-training LLMs at scale. It combines tensor parallelism, pipeline parallelism, data parallelism, and context parallelism (4D parallelism).

Features

  • Native PyTorch (no external frameworks)
  • 4D Parallelism out of the box
  • Production-ready checkpoint management
  • Integrated with PyTorch distributed