branch | macOS | Ubuntu | Format | Coverage |
---|---|---|---|---|
main | TBA | TBA | TBA | |
develop | TBA | TBA | TBA |
TACOS receives an arbitrary point-to-point network topology and autonomously synthesizes the topology-aware All-Reduce (Reduce-Scatter and All-Gather) collective communication algorithm. TACOS is powered by the Time-expanded Network (TEN) representation and Utilization Maximizing Link-Chunk Matching algorithm, thereby resulting in greater scalability to large networks.
Below figure summarizes the TACOS framework:
Please find more information about TACOS in this paper.
- William Won, Midhilesh Elavazhagan, Sudarshan Srinivasan, Ajaya Durg, Samvit Kaul, Swati Gupta, and Tushar Krishna, "TACOS: Topology-Aware Collective Algorithm Synthesizer for Distributed Machine Learning," arXiv:2304.05301 [cs.DC]
- Download the TACOS project.
git clone --recurse-submodules git@github.com:astra-sim/tacos.git
- Run TACOS with the provided script.
./tacos.sh
If you'd like to analyze the codebase, runner/main.cpp
is the main entry point.
To assist the execution environment setup, you may also consider building a Docker image.
docker built -t tacos .
You can start the Docker container as a sandboxed execution environment.
docker run -it -v /path/to/your/tacos/repository:/app/tacos tacos
# once Docker container starts running
cd /app/tacos
./tacos.sh
For any questions about TACOS, please contact Will Won or Tushar Krishna. You may also find or open a GitHub Issue in this repository.