This is the code to accompany our paper: Sanders, P. and Uhl, T.N., 2023. Engineering a Distributed-Memory Triangle Counting Algorithm. published in the proceedings of the 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
If you use this code in the context of an academic publication, we kindly ask you to cite it:
@inproceedings{sanders2023
author = {Peter Sanders and
Tim Niklas Uhl},
title = {Engineering a Distributed-Memory Triangle Counting Algorithm},
booktitle = {{IEEE} International Parallel and Distributed Processing Symposium,
{IPDPS} 2023, St. Petersburg, FL, USA, May 15-19, 2023},
pages = {702--712},
publisher = {{IEEE}},
year = {2023},
url = {https://doi.org/10.1109/IPDPS54959.2023.00076},
doi = {10.1109/IPDPS54959.2023.00076},
}
You can also find a freely accessible postprint in the arXiv.
Counting triangles in a graph and incident to each vertex is a fundamental and frequently considered task of graph analysis. We consider how to efficiently do this for huge graphs using massively parallel distributed-memory machines. Unsurprisingly, the main issue is to reduce communication between processors. We achieve this by counting locally whenever possible and reducing the amount of information that needs to be sent in order to handle (possible) nonlocal triangles. We also achieve linear memory requirements despite superlinear communication volume by introducing a new asynchronous sparse-all-to-all operation. Furthermore, we dramatically reduce startup overheads by allowing this communication to use indirect routing. Our algorithms scale (at least) up to 32 768 cores and are up to 18 times faster than the previous state of the art.
To compile this project you need:
- A modern, C++17-ready compiler such as
g++
version 9 or higher orclang
version 11 or higher. - OpenMPI or Intel MPI
- OpenMP and oneTBB
- Google Sparsehash
- Boost
git submodule update --init --recursive
mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make
For reproducing our experiments see the corresponding document.
We provide two base configurations of our algorithm:
- DiTriC, which uses the asynchronous buffered messaging approach from our publication
- CeTriC, which employs our communication reduction technique
While we provide many command-line parameters to tune our algorithms' behavior, we provide the variants from the publication as config presets:
# running DiTriC
mpiexec -np <NUM_PES> build/apps/katric <GRAPH> --config configs/ditric.toml
# running CeTriC
mpiexec -np <NUM_PES> build/apps/katric <GRAPH> --config configs/cetric.toml
If you want to enable the grid-based 2D indirect routing of messages, load the additional config
--config configs/2d-grid.toml
after loading one of the base configs.
To use the hybrid implementation add --config configs/hybrid.toml
and control the number of threads used per MPI rank via the --num-threads
parameter.
By default, this only prints the running time and the total number of triangles to stdout.
For detailed metrics of all algorithm phases and communication, use the --json-ouput
flag with stdout
or a outfile as parameter.
For additional parameters see --help
.
You can read undirected graphs from input files represented in METIS
or binary
format. The binary
is a lot faster to read. You can find some toy graphs in ./examples
.
To convert other graph represenations to supported formats see our graph converter suite.
Select the input format using --input-format
.
You can also use random graphs generated via KaGen.
Choose a generator with the --gen
flag and set the desired parameters (--gen_*
). See the KaGen documentation for details.
Licensed under MIT.