🎯

Reinforcement Learning

4 test cases

RLHF, DPO, PPO, and scalable RL frameworks for LLM alignment and post-training. Train reward models, optimize policies, and align models with human preferences at scale.

🤖

NVIDIA Isaac Lab

Sim-to-real robot learning with NVIDIA Isaac Lab on GPU clusters

Isaac LabRoboticsSim2RealPhysical AI

🎯

TRL (Transformers Reinforcement Learning)

HuggingFace TRL for RLHF, DPO, PPO, and reward model training

TRLRLHFDPOPPOAlignment

🎯

vERL

Scalable reinforcement learning framework for LLM alignment and post-training

vERLRLHFPPOScalable RLAlignment

🎯

SLIME

Lightweight distributed training library for efficient LLM fine-tuning

SLIMELightweightFine-tuningEfficient