NeuronX Distributed
Distributed training on AWS Trainium with NeuronX
View on GitHubOverview
NeuronX Distributed enables training on AWS Trainium chips using the Neuron SDK with tensor parallelism and pipeline parallelism support.
Key Features
- Tensor parallelism for Trainium
- Pipeline parallelism
- Gradient accumulation
- Mixed precision with BF16
- Integration with HuggingFace Optimum