AWS ParallelCluster

HPC cluster management with Slurm scheduler for distributed training

View on GitHub

Overview

AWS ParallelCluster is an open-source cluster management tool that deploys HPC clusters with Slurm scheduling, EFA networking, and auto-scaling compute nodes.

Features

  • Slurm job scheduler
  • EFA-enabled networking
  • Auto-scaling compute nodes
  • Custom AMI support
  • Multi-queue configuration