AWS ParallelCluster
HPC cluster management with Slurm scheduler for distributed training
View on GitHubOverview
AWS ParallelCluster is an open-source cluster management tool that deploys HPC clusters with Slurm scheduling, EFA networking, and auto-scaling compute nodes.
Features
- Slurm job scheduler
- EFA-enabled networking
- Auto-scaling compute nodes
- Custom AMI support
- Multi-queue configuration