PyTorch
12 test casesNative PyTorch distributed training examples covering DDP, FSDP, TorchTitan, DeepSpeed, and more. Includes LLM pre-training, fine-tuning, RLHF, inference serving, robotics, and multimodal models.
PyTorch FSDP
Fully Sharded Data Parallel training for large language models
PyTorch DDP
Distributed Data Parallel training - the foundation for multi-GPU PyTorch
DeepSpeed
Microsoft DeepSpeed ZeRO optimizer for memory-efficient distributed training
TorchTitan
PyTorch native distributed training framework for production LLM pre-training
Picotron
Lightweight distributed training library for educational and research use
vLLM
High-throughput LLM inference and serving engine
OpenRLHF
Open-source RLHF framework for training reward models and policy optimization
NVRx
NVIDIA's resilient training toolkit for fault-tolerant distributed workloads
NVIDIA Isaac Lab
Sim-to-real robot learning with NVIDIA Isaac Lab on GPU clusters
OpenVLA OFT
Open Vision-Language-Action models with fine-tuning for robotic manipulation
nanoVLM
Lightweight vision-language model training for embodied AI
V-JEPA 2
Video Joint Embedding Predictive Architecture for physical world understanding