Artificial Benchmark for Hypergraphs Community Detection (ABCDH) - A Random Hypergraph Model with Community Structure
Bogumił Kamiński, Paweł Prałat, François Théberge
The project is prepared to be run from command line interface.
Installation instructions:
- install Julia and make sure it is in search path
- clone the project repository
- change directory to
/utils
and runabcdh.jl
file stored there following the example instructions given below.
Here is an example session output (assuming you are in /utils
folder).
$ julia --project abcdh.jl --help
julia --project abcdh.jl -n N -d D -c C -x X -q Q -w W -s S [-o O] [-m] [-h] [--stats]
optional arguments:
-o O prefix for output file names; the generated file names
are [prefix]_deg.txt for degree sequence,
[prefix]_comm.txt for community size sequence,
[prefix]_assign.txt for assignment of vertices to
communities, [prefix]_he.txt for hyperedges
-m if this flag passed a multi-hypergraph is generated; by
default a simple hypergraph is generated
--stats if this flag passed print generated hypergraph
statistics
-h, --help show this help message and exit
required arguments:
-n N number of vertices
-d D either a tuple γ,δ,D or file with degree sequence
(filename cannot contain two commas)
-c C either a tuple β,s,S or file with community size
sequence (filename cannot contain two commas)
-x X mixing parameter ξ
-q Q either a sequence q₁,q₂,...,qₖ of weights of hyperedges
of sizes from 1 to k or a file name with such a sequence
in the same format (if a single number is passed it is
considered to be a filename)
-w W either one of values: ':strict', ':linear', ':majority' or a
file name with weights for w_cd where in i-th line are
comma separated weights for hyperedge of size i+2
ranging from floor(i+2,2)+1 to i; the :strict value
assumes that only c=d weight is non zero, the :linear
value assumes weight equal to c, the :majority value
assumes all weights are 1
-s S seed value for generator
$ julia --project abcdh.jl -n 100000 -d 2.5,5,100 -c 1.5,1000,10000 -x 0.5 -q 0.0,0.4,0.3,0.2,0.1 -w :linear -s 1234 -o result --stats
[ Info: failed to sample an admissible community sequence in 1000 draws. Fixing
[ Info: distribution of hyperedge proportions does not add up to 1. Fixing.
[ Info: Moving 1 stumps from community 2 to background graph
[ Info: Moving 1 stumps from community 4 to background graph
[ Info: Moving 1 stumps from community 10 to background graph
[ Info: Moving 1 stumps from community 12 to background graph
[ Info: Moving 1 stumps from community 14 to background graph
[ Info: Moving 1 stumps from community 20 to background graph
[ Info: Moving 1 stumps from community 22 to background graph
[ Info: Moving 1 stumps from community 23 to background graph
[ Info: Moving 1 stumps from community 24 to background graph
[ Info: Moving 1 stumps from community 26 to background graph
[ Info: Moving 1 stumps from community 29 to background graph
[ Info: Moving 1 stumps from community 35 to background graph
[ Info: Moving 1 stumps from community 36 to background graph
[ Info: fixing 291 hyperedges (0.07% of total number of hyperedges) that were multisets
[ Info: fixing 162 hyperedges (0.04% of total number of hyperedges) that were duplicated
[ Info: Degrees
generated degree distribution
Summary Stats:
Length: 100000
Missing Count: 0
Mean: 10.780250
Minimum: 5.000000
1st Quartile: 5.000000
Median: 7.000000
3rd Quartile: 11.000000
Maximum: 100.000000
Type: Int64
[ Info: Communities
generated community size distribution
Summary Stats:
Length: 38
Missing Count: 0
Mean: 2631.578947
Minimum: 1017.000000
1st Quartile: 1242.000000
Median: 2176.000000
3rd Quartile: 3729.750000
Maximum: 8947.000000
Type: Int64
[ Info: Hyperedges
5×3 DataFrame
Row │ x q vol
│ Int64 Float64 Float64?
─────┼─────────────────────────────────
1 │ 1 0.0 missing
2 │ 2 0.4 0.400119
3 │ 3 0.3 0.300013
4 │ 4 0.2 0.199955
5 │ 5 0.1 0.0999142
allunique(hyperedges) = true
all(allunique, hyperedges) = true
(ξ, ξ_emp) = (0.5, 0.4749853975989391)
4×2 DataFrame
Row │ he_size ξ_emp
│ Int64 Float64
─────┼───────────────────
1 │ 2 0.481998
2 │ 3 0.445908
3 │ 4 0.498098
4 │ 5 0.49248
[ Info: w
5×5 Matrix{Float64}:
1.0 0.0 0.0 0.0 0.0
0.0 1.0 0.4 0.0 0.0
0.0 0.0 0.6 0.428571 0.25
0.0 0.0 0.0 0.571429 0.333333
0.0 0.0 0.0 0.0 0.416667
[ Info: w_emp
5×5 Matrix{Float64}:
NaN 0.0 0.0 0.0 0.0
NaN 1.0 0.444781 0.0 0.0
NaN 0.0 0.555219 0.417643 0.246044
NaN 0.0 0.0 0.582357 0.336687
NaN 0.0 0.0 0.0 0.417269
$ julia --project abcdh.jl -n 100 -d 2.5,5,20 -c 1.5,10,30 -x 0.3 -q 0.0,0.4,0.3,0.2,0.1 -w :linear -s 1234 --stats
[ Info: distribution of hyperedge proportions does not add up to 1. Fixing.
[ Info: Moving 1 stumps from community 2 to background graph
[ Info: Moving 1 stumps from community 5 to background graph
[ Info: added degree to the following nodes due to parity issues: [47]
[ Info: fixing 29 hyperedges (9.15% of total number of hyperedges) that were multisets
[ Info: fixing 11 hyperedges (3.47% of total number of hyperedges) that were duplicated
[ Info: skipping saving generated graph
[ Info: Degrees
generated degree distribution
Summary Stats:
Length: 100
Missing Count: 0
Mean: 8.400000
Minimum: 5.000000
1st Quartile: 6.000000
Median: 7.000000
3rd Quartile: 9.250000
Maximum: 20.000000
Type: Int64
deviations from wanted degrees
1×3 DataFrame
Row │ degs_wanted degs_generated nrow
│ Int64 Int64 Int64
─────┼────────────────────────────────────
1 │ 7 8 1
[ Info: Communities
generated community size distribution
Summary Stats:
Length: 5
Missing Count: 0
Mean: 20.000000
Minimum: 11.000000
1st Quartile: 16.000000
Median: 17.000000
3rd Quartile: 26.000000
Maximum: 30.000000
Type: Int64
[ Info: Hyperedges
5×3 DataFrame
Row │ x q vol
│ Int64 Float64 Float64?
─────┼─────────────────────────────────
1 │ 1 0.0 missing
2 │ 2 0.4 0.42381
3 │ 3 0.3 0.307143
4 │ 4 0.2 0.185714
5 │ 5 0.1 0.0833333
allunique(hyperedges) = true
all(allunique, hyperedges) = true
(ξ, ξ_emp) = (0.3, 0.2870662460567823)
4×2 DataFrame
Row │ he_size ξ_emp
│ Int64 Float64
─────┼───────────────────
1 │ 2 0.353933
2 │ 3 0.139535
3 │ 4 0.282051
4 │ 5 0.357143
[ Info: w
5×5 Matrix{Float64}:
1.0 0.0 0.0 0.0 0.0
0.0 1.0 0.4 0.0 0.0
0.0 0.0 0.6 0.428571 0.25
0.0 0.0 0.0 0.571429 0.333333
0.0 0.0 0.0 0.0 0.416667
[ Info: w_emp
5×5 Matrix{Float64}:
NaN 0.0 0.0 0.0 0.0
NaN 1.0 0.635135 0.0 0.0
NaN 0.0 0.364865 0.464286 0.444444
NaN 0.0 0.0 0.535714 0.222222
NaN 0.0 0.0 0.0 0.333333
$ julia --project abcdh.jl -n 100 -d 2.5,5,20 -c 1.5,10,30 -x 0.3 -q 0.0,0.4,0.3,0.2,0.1 -w :linear -s 1234 --stats -m
[ Info: distribution of hyperedge proportions does not add up to 1. Fixing.
[ Info: Moving 1 stumps from community 2 to background graph
[ Info: Moving 1 stumps from community 5 to background graph
[ Info: added degree to the following nodes due to parity issues: [47]
[ Info: skipping saving generated graph
[ Info: Degrees
generated degree distribution
Summary Stats:
Length: 100
Missing Count: 0
Mean: 8.400000
Minimum: 5.000000
1st Quartile: 6.000000
Median: 7.000000
3rd Quartile: 9.250000
Maximum: 20.000000
Type: Int64
deviations from wanted degrees
1×3 DataFrame
Row │ degs_wanted degs_generated nrow
│ Int64 Int64 Int64
─────┼────────────────────────────────────
1 │ 7 8 1
[ Info: Communities
generated community size distribution
Summary Stats:
Length: 5
Missing Count: 0
Mean: 20.000000
Minimum: 11.000000
1st Quartile: 16.000000
Median: 17.000000
3rd Quartile: 26.000000
Maximum: 30.000000
Type: Int64
[ Info: Hyperedges
5×3 DataFrame
Row │ x q vol
│ Int64 Float64 Float64?
─────┼─────────────────────────────────
1 │ 1 0.0 missing
2 │ 2 0.4 0.42381
3 │ 3 0.3 0.307143
4 │ 4 0.2 0.185714
5 │ 5 0.1 0.0833333
allunique(hyperedges) = false
all(allunique, hyperedges) = false
(ξ, ξ_emp) = (0.3, 0.19873817034700317)
4×2 DataFrame
Row │ he_size ξ_emp
│ Int64 Float64
─────┼────────────────────
1 │ 2 0.241573
2 │ 3 0.104651
3 │ 4 0.25641
4 │ 5 0.0714286
[ Info: w
5×5 Matrix{Float64}:
1.0 0.0 0.0 0.0 0.0
0.0 1.0 0.4 0.0 0.0
0.0 0.0 0.6 0.428571 0.25
0.0 0.0 0.0 0.571429 0.333333
0.0 0.0 0.0 0.0 0.416667
[ Info: w_emp
5×5 Matrix{Float64}:
NaN 0.0 0.0 0.0 0.0
NaN 1.0 0.467532 0.0 0.0
NaN 0.0 0.532468 0.344828 0.461538
NaN 0.0 0.0 0.655172 0.230769
NaN 0.0 0.0 0.0 0.307692
The session shows you:
- How to get help.
- Generating standard hypergraph.
- Generating a small "tight" hypergraph (that has significant number of initial collisions).
- Generating a small "tight" non-simple hypergraph (collisions are accepted).
For steps 3 and 4 results are not saved to disk.
Note that statstics generation with --stats
option can be time consuming for large hypergraphs.