Rmatter generates an R-MAT graph in a fast, space-efficient, and deterministic way.
- Fast: This generator was developed because existing tools were too slow to generate large graphs. Rmatter can generate an R-MAT27 graph (N=2^27, M=2^31) within 3 minutes on my laptop (much faster on many-core servers!).
- If a sufficient number of cores are provided, the processing speed is mostly limited by the disk IO bandwidth.
- Space-Efficient: Rmatter can generate graphs that are much larger than the local RAM size by bounding the total memory usage.
- This idea comes from another R-MAT graph generator PaRMAT.
- Deterministic: No matter how many times you run it, it generates the same graph (thanks to PCG random number generator).
- The output order is nondeterministic, but providing the same random seed will generate an identical graph.
A compiler with C++20 support is required.
Clone and compile:
git clone https://github.com/s417-lama/rmatter.git
cd rmatter/
make
By default, g++
is used for compilation (change Makefile
if needed).
Generate an R-MAT20 graph (N=2^20, M=2^24) and save it as rmat20.txt
:
./rmatter -n 1048576 -m 16777216 -o rmat20.txt
The following output will be displayed:
Generating an R-MAT graph with N = 1048576, M = 16777216 ...
Parameters: a = 0.57, b = 0.19, c = 0.19, d = 0.05
Random seed: 0
Estimated file size: 0.25 GB
12 threads will be spawned.
14.8448 GB of RAM will be used at maximum.
Chunk size: 174762 edges
Number of chunks: 97
[||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||] 100%
Done.
The generated graph has successfully been written to: rmat20.txt
The output format is an edge list:
$ head rmat20.txt
524288 598018
4736 66560
1088 2
786432 131361
270865 37388
25472 44304
791314 537
75795 73756
143488 295712
595970 313986
See help:
$ ./rmatter -h
Usage: ./rmatter [options]
options:
-n : Number of vertices
-m : Number of edges
-a : Parameter a (default: 0.57)
-b : Parameter b (default: 0.19)
-c : Parameter c (default: 0.19)
-r : Random seed (default: 0)
-t : Number of threads
-s : Fraction of memory space usage (default: 0.8)
-o : Output filename (default: out.txt)
The show_adj_matrix.py
script will visualize the adjacency matrix and histograms for graphs.
The following python packages will be required:
pip3 install "dask[array]" "dask[dataframe]" plotly
Visualize rmat20.txt
:
python3 show_adj_matrix.py rmat20.txt
Binary edge list format for Gemini graph processing framework:
./converter -i out.txt -o out.binedgelist
- farkhor/PaRMAT: a multi-threaded RMAT graph generator
- Rmatter is much simpler and faster, although PaRMAT has more capabilities (e.g., avoiding duplicated edges).
- Rmatter's space-efficient execution took inspiration from PaRMAT.
- PaRMAT is not deterministic.