Training Frameworks

FSDPShardingLarge ModelsMulti-GPU

PyTorch FSDP

Fully Sharded Data Parallel training for large language models

DDPData ParallelMulti-GPUBaseline

PyTorch DDP

Distributed Data Parallel training - the foundation for multi-GPU PyTorch

DeepSpeedZeROMemory EfficientLarge Models

DeepSpeed

Microsoft DeepSpeed ZeRO optimizer for memory-efficient distributed training

TorchTitanPre-training4D ParallelismProduction

TorchTitan

PyTorch native distributed training framework for production LLM pre-training

PicotronLightweightEducationalResearch

Picotron

Lightweight distributed training library for educational and research use

🚀

vLLM

High-throughput LLM inference and serving engine

vLLMInferenceServingPagedAttention

DynamoInferenceKV CacheDisaggregatedSGLang

OpenRLHF

Open-source RLHF framework for training reward models and policy optimization

RLHFPPODPOAlignment

🚀

NVIDIA Dynamo

Distributed LLM inference with KV cache-aware routing and disaggregated prefill/decode on HyperPod EKS

MosaicMLComposerTraining EfficiencySpeedups

MosaicML Composer

Training efficiency library with algorithmic speedups and multi-GPU orchestration

Isaac LabRoboticsSim2RealPhysical AIReinforcement Learning

NVIDIA Isaac Lab

Sim-to-real robot learning with NVIDIA Isaac Lab on GPU clusters

OpenVLAVLARoboticsFine-tuningPhysical AI

OpenVLA OFT

Open Vision-Language-Action models with fine-tuning for robotic manipulation

nanoVLMVLMMultimodalPhysical AIVision-Language

nanoVLM

Lightweight vision-language model training for embodied AI

V-JEPA 2VideoSelf-supervisedPhysical AIWorld Models

V-JEPA 2

Video Joint Embedding Predictive Architecture for physical world understanding

CosmosWorld ModelsPhysical AIOmnimodalVideo Generation

Cosmos 3

NVIDIA Cosmos 3 Physical AI flywheel — omnimodal world models for generate → post-train → eval

DreamZeroWorld ModelsPhysical AIRoboticsVideo Diffusion

DreamZero

14B World-Action Model (WAM) for robotic manipulation via video diffusion on EKS

V-JEPA 2VideoSelf-supervisedPhysical AIWorld Models

V-JEPA 2.1

Updated Video Joint Embedding Predictive Architecture for physical world understanding

PointWorld3D World ModelsPhysical AIRoboticsPoint Flow

PointWorld

Distributed 3D world model pre-training for robotic manipulation (NVIDIA + Stanford)

OpenVLAVLARoboticsPhysical AIVision-Language-Action

OpenVLA

Open Vision-Language-Action model for generalist robotic manipulation

TRLRLHFDPOPPOAlignmentReinforcement Learning

TRL (Transformers Reinforcement Learning)

HuggingFace TRL for RLHF, DPO, PPO, and reward model training

vERLRLHFPPOScalable RLAlignmentReinforcement Learning

vERL

Scalable reinforcement learning framework for LLM alignment and post-training

SLIMELightweightFine-tuningEfficientReinforcement Learning

SLIME

Lightweight distributed training library for efficient LLM fine-tuning

🧪

Model Distillation

Knowledge distillation for compressing large models into smaller, efficient ones

DistillationKnowledge TransferCompressionModel Customisation

⚡

Megatron / NeMo

6 test cases

Megatron-LMTensor ParallelPipeline ParallelExpert Parallel

Megatron-LM

NVIDIA's framework for training multi-billion parameter transformer models

NeMoPre-trainingFine-tuningPEFTMulti-modal

NVIDIA NeMo

End-to-end framework for building, training, and deploying AI models

BioNeMoProteinDrug DiscoveryESM

NeMo RL

Reinforcement learning from human feedback with NeMo

NeMoRLHFPPOReward Models

🧪

BioNeMo

NVIDIA's framework for biomolecular AI model training

Megatron-BridgeMoEExpert ParallelUCCLEFA

Megatron-Bridge

NVIDIA Megatron-Bridge + UCCL-EP for MoE training with expert-parallel all-to-all over EFA

JAXPaxMLXLAAuto-parallelism

NeMo 1.0 (Legacy)

Legacy NeMo 1.0 training examples — superseded by NeMo 2.x

NeMoLegacyDeprecated

🧬

JAX

1 test cases

🧬

JAX PaxML

JAX-based distributed training with Google Pax framework

🧠

AWS Neuron

2 test cases

🧠

NeuronX Distributed

Distributed training on AWS Trainium with NeuronX

NeuronXTrainiumInferentiaCustom Silicon

🧠

Optimum Neuron

HuggingFace Optimum for training and inference on AWS Trainium and Inferentia

Optimum NeuronHuggingFaceTrainiumInferentia

🎯

Reinforcement Learning

4 test cases

Isaac LabRoboticsSim2RealPhysical AI

NVIDIA Isaac Lab

Sim-to-real robot learning with NVIDIA Isaac Lab on GPU clusters

TRL (Transformers Reinforcement Learning)

HuggingFace TRL for RLHF, DPO, PPO, and reward model training

TRLRLHFDPOPPOAlignment

vERLRLHFPPOScalable RLAlignment

vERL

Scalable reinforcement learning framework for LLM alignment and post-training

SLIMELightweightFine-tuningEfficient

SLIME

Lightweight distributed training library for efficient LLM fine-tuning

🧪

Model Customisation

1 test cases

🧪

Model Distillation

Knowledge distillation for compressing large models into smaller, efficient ones

DistillationKnowledge TransferCompression

🤖

Physical AI & Robotics

9 test cases

Isaac LabRoboticsSim2RealPhysical AIReinforcement Learning

NVIDIA Isaac Lab

Sim-to-real robot learning with NVIDIA Isaac Lab on GPU clusters

OpenVLAVLARoboticsFine-tuningPhysical AI

OpenVLA OFT

Open Vision-Language-Action models with fine-tuning for robotic manipulation

nanoVLMVLMMultimodalPhysical AIVision-Language

nanoVLM

Lightweight vision-language model training for embodied AI

V-JEPA 2VideoSelf-supervisedPhysical AIWorld Models

V-JEPA 2

Video Joint Embedding Predictive Architecture for physical world understanding

CosmosWorld ModelsPhysical AIOmnimodalVideo Generation

Cosmos 3

NVIDIA Cosmos 3 Physical AI flywheel — omnimodal world models for generate → post-train → eval

DreamZeroWorld ModelsPhysical AIRoboticsVideo Diffusion

DreamZero

14B World-Action Model (WAM) for robotic manipulation via video diffusion on EKS

V-JEPA 2VideoSelf-supervisedPhysical AIWorld Models

V-JEPA 2.1

Updated Video Joint Embedding Predictive Architecture for physical world understanding

PointWorld3D World ModelsPhysical AIRoboticsPoint Flow

PointWorld

Distributed 3D world model pre-training for robotic manipulation (NVIDIA + Stanford)