This is the repository for DisTrO, a family of architecture-agnostic and network-agnostic distributed optimizers that reduce inter-GPU communication requirements by four to five orders of magnitude without relying on amortized analysis, enabling low-latency training of large neural networks on slow internet bandwidths with heterogeneous networking hardware.
- Aug. 26th, 2024: Preliminary Report
- Coming Soon: Paper and Code
- In The Near Future: 👀
Join us on Discord if you're interested in helping research and build the future of distributed training.