🔥

PyTorch

12 test cases

Native PyTorch distributed training examples covering DDP, FSDP, TorchTitan, DeepSpeed, and more. Includes LLM pre-training, fine-tuning, RLHF, inference serving, robotics, and multimodal models.

🔥

PyTorch FSDP

Fully Sharded Data Parallel training for large language models

FSDPShardingLarge ModelsMulti-GPU
🔥

PyTorch DDP

Distributed Data Parallel training - the foundation for multi-GPU PyTorch

DDPData ParallelMulti-GPUBaseline
🔥

DeepSpeed

Microsoft DeepSpeed ZeRO optimizer for memory-efficient distributed training

DeepSpeedZeROMemory EfficientLarge Models
🔥

TorchTitan

PyTorch native distributed training framework for production LLM pre-training

TorchTitanPre-training4D ParallelismProduction
🔥

Picotron

Lightweight distributed training library for educational and research use

PicotronLightweightEducationalResearch
🚀

vLLM

High-throughput LLM inference and serving engine

vLLMInferenceServingPagedAttention
🔥

OpenRLHF

Open-source RLHF framework for training reward models and policy optimization

RLHFPPODPOAlignment
🛡️

NVRx

NVIDIA's resilient training toolkit for fault-tolerant distributed workloads

NVRxResilienceFault ToleranceCheckpointing
🤖

NVIDIA Isaac Lab

Sim-to-real robot learning with NVIDIA Isaac Lab on GPU clusters

Isaac LabRoboticsSim2RealPhysical AIReinforcement Learning
🤖

OpenVLA OFT

Open Vision-Language-Action models with fine-tuning for robotic manipulation

OpenVLAVLARoboticsFine-tuningPhysical AI
🤖

nanoVLM

Lightweight vision-language model training for embodied AI

nanoVLMVLMMultimodalPhysical AIVision-Language
🤖

V-JEPA 2

Video Joint Embedding Predictive Architecture for physical world understanding

V-JEPA 2VideoSelf-supervisedPhysical AIWorld Models