Managed Collective Communication Service
This repo contains the source code for the SIGCOMM paper MCCS and scripts to reproduce major results in the paper (Figure 6, 8, 9).
MCCS's evaluation requires configuration on the switch to create our evaluation scenarios. We have setup such environment on our on-premise testbed in Duke. Please leave your public key as a comment on HotCRP to let us give your access to the servers. We will setup a user for you and then provide you with instructions to access our testbed.
Please coordinate with other reviewers to ensure that only one reviewer is conducting experiments on our servers at a time. Otherwise, the scripts may not work as expected.
- rust/cargo
- python3
- justfile
- Clone the code
git clone https://github.com/phoenix-dataplane/mCCS.git
- Build CUDA code
Undersrc/collectives
, runmake
to build the CUDA binaries.
cd src/collectives
make -j
- Build Rust code
Under the project root directory, runcargo build --release
to build the Rust code.
cargo build --release
In case of unexpected errors (e.g., scripts are mistakenly launched), run the following commands to terminate previous processes:
just killall
Also, make sure no other processes with 'mccs' in the commandline run on the corresponding machines. Otherwise auto-clean functionality in the script may not work properly.:
Run the following command:
just prepare-environment
This step will reproduce Figure 6 for single application benchmarks.
First, run all MCCS benchmarks under ECMP and the best-fit flow scheduling algorithm from Section 4.3 of the paper.
Run the following commands respectively under the root directory of the project:
just four_gpu_ecmp
just eight_gpu_ecmp
just four_gpu_flow
just eight_gpu_flow
Collect the results for MCCS using:
cd eval/single-app
python collect.py
The results will be generated to eval/plot/data
as CSV files mccs-[allgather/allreduce]-[4/8]gpu.csv
for the 4 settings correspond to Figure 6 (a)-(d).
We have also prepared the pre-run results for NCCL baselines under the CSV files in eval/plot/data
.
To plot the figures, just append the MCCS results in the 4 CSV files to the CSV files under eval/plot/single_app
. Install matplotlib
, seaborn
, pandas
, numpy
and scipy
.
pip install matplotlib seaborn pandas numpy scipy
cd eval/plot
python single_app/main.py
The figures for Figure 6 (a)-(d) will be generated under eval/plot/figures
as [allgather/allreduce]_[4/8]gpu.pdf
.
However, if you want to run NCCL baselines yourself, first build NCCL v2.17.1 using the official NCCL repo:
git clone https://github.com/NVIDIA/nccl.git
git checkout v2.17.1-1
make -j src.build CUDA_HOME=/usr/local/cuda-12.3 NVCC_GENCODE="-gencode=arch=compute_86,code=sm_86"
Then build our forked version of nccl-tests repo in nccl-tests-mccs
folder, where the changes we made are provided in nccl-test.patch
file.
cd nccl-tests-mccs
make -j MPI=1 MPI_HOME=/usr/lib/x86_64-linux-gnu/openmpi CUDA_HOME=/usr/local/cuda-12.3 NCCL_HOME=[path to NCCL v2.17.1 library just built]
Make sure the switch is set to ECMP setting by running:
./eval/set_ecmp_hashing_algo.sh everything
Then run NCCL and NCCL(OR) under all settings:
cd nccl-tests-mccs/microbenchmark
./one_click_run_nccl_all.sh
Collect the results using and paste the results to corresponding CSV files under eval/plot/single_app
for the plot script.
python collect_nccl.py --app [allgather/allreduce] --num-gpus [1/2]
This step will reproduce Figure 8 for the multiple application benchmarks.
First, run the following commands respectively under the root directory of the project to get the results of MCCS under all 4 settings.
just batched-allreduce-multi-final
Collect the results:
cd eval/multi-app
python collect_multi.py
Append the results multi-setting-[1/2/3/4].csv
under eval/plot/data
to setting[1/2/3/4].csv
under eval/plot/multi_app
, where the pre-run NCCL baselines results are provided.
To generate the figures, run:
cd eval/plot
python multi_app/main.py
The figures will be generated as setting[1/2/3/4]_allreduce.pdf
under eval/plot/figures
.
Note that in our scripts, setting 3 corresponds to Setup 4 in the paper, i.e., Figure 8(d); setting 4 corresponds to Setup 3 in the paper, i.e., Figure 8(c).
The config files and scripts to run multi application benchmarks for NCCL and NCCL(OR) is in nccl-test-mccs/setting[1-4]
. We take setting 2 for example.
cd eval/multi-app
# run benchmarks
./run_nccl_all_jobs_multiple_times.sh 20 [goodring/badring] # NCCL(OR) / NCCL
# collect the results
python collect_nccl.py --solution [NCCL GR/NCCL BR] --strip-head 2 --strip-tail 2 # NCCL(OR) / NCCL
Then paste the results to corresponding CSV files in setting[1/2/3/4].csv
under eval/plot/multi_app
.
We note that NCCL(OR) and NCCL are equivalent for setting 1 and setting 4.
This step will reproduce Figure 9 in the paper.
Run the following commands respectively under the root directory of the project:
just collect-setup4-ecmp
just collect-setup4-normal
Collect the results:
cd eval/multi-app
python collect_real_workload.py
Copy the results generated in eval/plot/data/real_workload.csv
to eval/plot/real_workload/jct.csv
for the plotting script, where we have provided pre-run results. Plot the figure:
cd eval/plot
python real_workload/plot_jct.py
The figure will be generated to eval/plot/figures/real_workload_jct.pdf
.