/awesome-distributed-ml

A curated list of awesome projects and papers for distributed training or inference

Awesome Distributed Machine Learning System

Awesome PRs Welcome

A curated list of awesome projects and papers for distributed training or inference especially for large model.

Contents

Open Source Projects

Papers

Survey

Pipeline Parallelism

Sequence Parallelism

Mixture-of-Experts System

Graph Neural Networks System

Hybrid Parallelism & Framework

Memory Efficient Training

Tensor Movement

Auto Parallelization

Communication Optimization

Fault-tolerant Training

Inference and Serving

Applications

Contribute

All contributions to this repository are welcome. Open an issue or send a pull request.