NS3-Interface
This repository contains an NS3-based network simulator that acts as a network backend for ASTRA-sim. ASTRA-sim is a distributed machine learning system simulator, developed as a joint collaboration between Georgia Tech, Meta, and Intel. For more details on ASTRA-sim, please refer to the ASTRA-sim repository
This repository is extended from https://github.com/alibaba-edu/High-Precision-Congestion-Control, and simulates various congestion control algorithms. With ASTRA-sim this repository was used to study the performance of different network congestion control algorithms in largescale collective communications in multi-gpu scenarios. More details on the study can be found in our paper:
T. Khan, S. Rashidi, S. Sridharan, P. Shurpali, A. Akella and T. Krishna, "Impact of RoCE Congestion Control Policies on Distributed Training of DNNs," In proceedings of the IEEE Symposium on High-Performance Interconnects (HOTI), 2022 [pdf]
This repository is only compatible with a previous version of ASTRA-sim (Astra-sim 1.0), which is available in the
ASTRA-sim-1.0
branch of the ASTRA-sim repository. A new version of this repository to fit with ASTRA-sim 2.0 is being developed. This repository will be maintained for backward compatibility purposes. This version of ns3 requires gcc-5, g++-5, and python2.7. It also requires thepython
command to default to python2.7.
Instructions to build and run with this repository
This repository is used as a submodule in the ASTRA-sim repository. Instead of cloning this repository directly, please refer to the build instructions in the ASTRA-sim repository
Input Description
This section describes the knobs to tweak the NS3 network backend.
Network Topology
The network topology files are located in simulation/mix
.
Let's take a sample file, 7_nodes_3_switch_topology.txt
, and examine its components.
This file describes a topology where there are 7 nodes, 3 of which are network switches (4, 5, 6). There are 8 links in total.
7 3 8 // {no. of nodes} {no. of switches} {no. of links}
4 5 6 // {List of node id for switches}
4 0 200Gbps 0.0005ms 0 // From here on each line corresponds to one link.
6 0 200Gbps 0.0005ms 0 // {id of one endpoint} {id of another endpoint} {bandwidth} {latency} {error rate of link}
4 1 200Gbps 0.0005ms 0
6 1 200Gbps 0.0005ms 0
5 2 200Gbps 0.0005ms 0
6 2 200Gbps 0.0005ms 0
5 3 200Gbps 0.0005ms 0
6 3 200Gbps 0.0005ms 0
Configuration
The execution script in the ASTRA-sim repository (astra-sim/network/frontend/build.sh
) designates which configuration file to use.
Refer to this spreadsheet for a detailed description of each knobs.
Contact Us
This project is a collaboration of dedicated professionals. Each core developer and contributor plays a unique role in the project. For any inquiries or questions, feel free to reach out to the corresponding developer based on their expertise.
Developer | Organization | Responsibility | Contact |
---|---|---|---|
Tarannum Khan | Netflix | Development, connecting with ASTRA-sim 1.0 | tkhaniitr@gmail.com |
Saeed Rashidi | Hewlett Packard Labs | ASTRA-sim 1.0 | rashidi1saeid@gmail.com |
Jinsun Yoo | Georgia Tech | Connection with ASTRA-sim 2.0, Current Inquiries | jinsun@gatech.edu |
Tushar Krishna | Georgia Tech | General inquiries | tushar@ece.gatech.edu |