Abacus is a novel streaming algorithm for butterfly counting in a fully dynamic graph stream, which entails edge insertions and deletions. Abacus estimates the buttefly counts of by making a single pass over the graph stream.
Here, we explain how to setup Abacus in order to run from your machine.
Any modern x86-based multicore machine can be used to run Abacus.
Abacus requires Java 8
or later. Furthermore, abacus utilizes the fastutil java
library (we use version 7.2.0
and the jar is included in our repo).
The datasets we used can be found here.
If you want to use your own datasets, create a folder at datasets/
folder
containing your selected dataset.
- Columns must be separated by whitespace ' '.
- Each row must correspond to an edge (u v o).
- o denotes the operation, if
o = 1
then addition, otherwise ifo = -1
then deletion. - All nodes in the same column must be from the same partition.
To produce a dataset that contains edge deletions follow the next steps:
- Execute the
preprocess.py
script that is located insrc/makis/python/
folder. By doing this the edge insertionos are created (each line haso = 1
) and any potential comments are removed. - Execute the
addDeletions.py
script that is located insrc/makis/python/
folder. Select a preprocessed dataset from step 1, and insert the deletions ratio(s) (values must be between 0 and 1). - You can find each produced dataset in its corresponding folder.
Note that in the dataset/
folder we have included a reduced version of the Movielens dataset with 20K
edges, which we have
preprocessed (all edges have o = 1
).
For running Abacus execute the callAbacus.py
script that is located in the src/main/python/
folder.
For instance, we show the output of Abacus on Movielens with 20K
edges and with α = 20%
deletions ratio:
Write the location of the data folder from the project root:
datasets
Write the indexes of the datasets (separated by commas)
__MOVIELENS-20K__
1: movielens-20k_alpha=0.2.dat
2: movielens-20k.dat
If you want to use all of them just press Enter
1
These are the selected datasets:
'movielens-20k_alpha=0.2.dat'
Write the memory budgets (separated by commas)
2000
These are the selected ratios:
'2000'
Write the step for writing the butterfly counts: >? 2000
We write butterfly counts every 2000 edges
Which random seed to use?: >? 0
We use 0 as random seed
Compilation finished successfully
budget: 2000
dataPath: ../../../datasets/movielens-20k/movielens-20k_alpha=0.2.dat
action: Accuracy
step: 2000
seed: 0
Memory budget: 2000
2000 0.0
4000 0.0
6000 21.562272777529905
8000 57.7663334895676
10000 224.2870824826828
12000 456.5520715477859
14000 614.0336150072033
16000 1652.6808603115373
18000 1715.4746259721455
20000 2718.6998390141016
22000 3241.4354696702703
24000 1920.631533191521
The total time (in secs) is: 0.093337963
Similarly, for running Parabacus execute the callParabacus.py
script located in the src/main/python/
folder.