NeMo RL

Reinforcement learning from human feedback with NeMo

View on GitHub

Overview

NeMo RL extends NeMo with reinforcement learning capabilities for alignment training using RLHF, PPO, and reward model training.