TorchTitan

PyTorch native distributed training framework for production LLM pre-training

Overview

TorchTitan is a PyTorch-native framework for pre-training LLMs at scale. It combines tensor parallelism, pipeline parallelism, data parallelism, and context parallelism (4D parallelism).

Features

Native PyTorch (no external frameworks)
4D Parallelism out of the box
Production-ready checkpoint management
Integrated with PyTorch distributed