PyTorch DDP

Distributed Data Parallel training - the foundation for multi-GPU PyTorch

Overview

PyTorch Distributed Data Parallel (DDP) is the standard approach for multi-GPU data-parallel training. It replicates the model on each GPU and synchronizes gradients during backpropagation.

When to Use DDP

Model fits in a single GPU memory
You want to scale training across multiple GPUs/nodes
Simple setup with minimal code changes

Quick Start

import torch.distributed as dist
from torch.nn.parallel import DistributedDataParallel as DDP

dist.init_process_group("nccl")
model = DDP(model, device_ids=[local_rank])