PyTorch DDP

Distributed Data Parallel training - the foundation for multi-GPU PyTorch

View on GitHub

Overview

PyTorch Distributed Data Parallel (DDP) is the standard approach for multi-GPU data-parallel training. It replicates the model on each GPU and synchronizes gradients during backpropagation.

When to Use DDP

  • Model fits in a single GPU memory
  • You want to scale training across multiple GPUs/nodes
  • Simple setup with minimal code changes

Quick Start

import torch.distributed as dist
from torch.nn.parallel import DistributedDataParallel as DDP

dist.init_process_group("nccl")
model = DDP(model, device_ids=[local_rank])