TorchTitan
PyTorch native distributed training framework for production LLM pre-training
View on GitHubOverview
TorchTitan is a PyTorch-native framework for pre-training LLMs at scale. It combines tensor parallelism, pipeline parallelism, data parallelism, and context parallelism (4D parallelism).
Features
- Native PyTorch (no external frameworks)
- 4D Parallelism out of the box
- Production-ready checkpoint management
- Integrated with PyTorch distributed