NCCL Tests
NVIDIA Collective Communications Library benchmarks for GPU cluster networking
View on GitHubOverview
NCCL Tests measure the performance of collective communication operations (AllReduce, AllGather, ReduceScatter, etc.) across GPU clusters. Essential for validating EFA connectivity and network performance.
Quick Start
# Run all-reduce benchmark
mpirun -np 8 --hostfile hosts /opt/nccl-tests/build/all_reduce_perf -b 8 -e 2G -f 2 -g 1
Key Metrics
- Bus Bandwidth: Effective bandwidth per GPU
- Algorithm Bandwidth: Total algorithm throughput
- Latency: End-to-end collective operation time