A parallel toolkit implemented with Pthreads (or MPI) to calculate various extrinsic and intrinsic quality metrics (with and without ground truth community structure) for non-overlapping (hard, single membership) clusterings.
The original version is extended with the GNU-style arguments parsing, selective evaluation of the quality measures and the standard Makefile by Artem Lutov artem@exascale.info. The extension is licensed under the Apache License v.2, about the license of the original code please ask the initial authors.
Evaluating quality metrics:
- extrinsic quality metrics (accuracy): F1-measure, NVD; VI, NMI; JI, RI, ARI.
- intrinsic quality metrics (statistical properties): Conductance, Q, Qds, intraEdges, interEdges, intraDensity, contraction, expansion.
Authors: Mingming Chen mileschen2008@gmail.com and Sisi Liu liusisiapply@gmail.com.
Collaborator: Boleslaw K. Szymanski szymab@rpi.edu.
Please email comments and suggestions to Mingming Chen mileschen2008@gmail.com and Sisi Liu liusisiapply@gmail.com.
Paper: Mingming Chen, Sisi Liu, and Boleslaw Szymanski, “Parallel Toolkit for Measuring the Quality of Network Community Structure”, The First European Network Intelligence Conference (ENIC), Wroclaw, Poland, September, 2014, pp. 22-29.
$ ./bin/Release/pcomet -h
Usage: pcomet [OPTIONS] ground-truth|ipnut-network clustering
ground-truth - ground-truth clustering (communities) for the extrinsic
metrics evaluation. The clusterins are specified in the NCL format, where each
line consists of the member node ids of the respective cluster (community).
ipnut-network - input network for the intrinsic metrics evaluation. The
input network is specified in the NSL format, where each line describes the
respective link: <src_id> <dst_id> [<weight>].
clustering - input file, collection of the clusters (detected communities)
to be evaluated. in the NCL format, where each line list member node ids of the
respective cluster (community).
Examples:
$ ./pcomet -n 4 -ef1m ./dataset/football_true_community.groups ./dataset/football_detected_community.groups
$ ./pcomet -n 4 -i ./dataset/football_network.pairs
./dataset/football_detected_community.groups
Extrinsic or intrinsic measures are evaluated. For the extrinsic measures, two
input clusterings (collections of clusters/communities) are compared to each
other, whether one of them typically is a ground-truth clustering. For the
intrinsic measures, the clustering is processed together with the respective
input network (graph).
-h, --help Print help and exit
-V, --version Print version and exit
-n, --num=SHORT the number of threads to be used. (default=`1')
-f, --cpufreq=SHORT CPU frequency to measure timing of the metrics, MHz
(can be fetched by `$ lscpu`). (default=`2100')
Group: gr_metrics
Quality metrics type to be evaluated
-e, --extrinsic[=ENUM] extrinsic quality metrics to be evaluated (possible
values="all", "f1m", "nvd" default=`all')
-i, --intrinsic[=ENUM] intrinsic quality metrics to be evaluated (possible
values="all" default=`all')
-w, --weighted the input network is weighted (default=off)
-d, --directed the input network is directed (default=off)
Example of the output:
$ ./bin/Release/pcomet -n 4 -e dataset/football_true_community.groups dataset/football_detected_community.groups
Executing in 4 threads...
Entropy metric timings: 0.000577824; VI: 0.536747, NMI: 0.924195
Cluster metric timings: 0.000714213; F1-measure: 0.914482, NVD: 0.073913
Index metric timings: 0.000552153; RI: 0.984744, ARI: 0.89665, JI: 0.826389
The
MPI-based
implementation has another (original) execution parameters and output format.Parameters for calculating the metrics with ground truth community structure:
$ mpirun -np 4 ./mpimetric metricType realCommunityFile detectedCommunityFile
Parameters for calculating the metrics with ground truth community structure:mpirun -np 4 ./mpimetric metricType detectedCommunityFile networkFile [isUnweighted] [isUndirected]
Parameters introduction:
metricType
: metricType=1 for metrics with ground truth community structure; metricType=0 for metrics without ground truth community structure.
realCommunityFile
: it is the file of the ground truth community structure.
detectedCommunityFile
: it is the file of the discovered community structure.
networkFile
: the file of the network.
isUnweighted
: it is optional and default value is 1; isUnweighted=1 for unweighted nework; isUnweighted=0 for weighted network.
isUndirected
: it is optional and default value is 1; isUndirected=1 for undirected network; isUndirected=0 for directed network.
numThreads
: the number of threads adopted in the parallel Pthreads program; Its value should be equal to or larger than 1.Output format of
parallel MPI programs
for calculating the metrics with ground truth community structure:numProcs total_running_time_information_theory_metrics computation_time msg_passing_time total_running_time_cluster_matching_metrics computation_time msg_passing_time total_running_time_pair_counting_metrics computation_time msg_passing_time numProcs VI NMI F-measure NVD RI ARI JI
Output format ofparallel MPI programs
for calculating the metrics without ground truth community structure:numProcs total_running_time numProcs modularity modularity_density #intra-edges intra-density contraction #inter-edges expansion conductance
Just $ make
for the Pthread
-based implementation or perform the custom compilation for the MPI-based implementation: $ mpic++ [OPTIONS] src/MPIMetricMain.cpp -o mpcomet
.
- xmeasures - Extrinsic quality (accuracy) measures evaluation for the overlapping clustering on large datasets: family of mean F1-Score (including clusters labeling), Omega Index (fuzzy version of the Adjusted Rand Index) and standard NMI (for non-overlapping clusters).
- GenConvNMI - Overlapping NMI evaluation that is compatible with the original NMI and suitable for both overlapping and multi resolution (hierarchical) clustering evaluation.
- OvpNMI - Another method of the NMI evaluation for the overlapping clusters (communities) that is not compatible with the standard NMI value unlike GenConvNMI, but it is much faster than GenConvNMI.
- Clubmark - A parallel isolation framework for benchmarking and profiling clustering (community detection) algorithms considering overlaps (covers).
- CluSim - A Python module that evaluates (slowly) various extrinsic quality metrics (accuracy) for non-overlapping (single membership) clusterings.
- resmerge - Resolution levels clustering merger with filtering. Flattens hierarchy/list of multiple resolutions levels (clusterings) into the single flat clustering with clusters on various resolution levels synchronizing the node base.
- ExecTime - A lightweight resource consumption profiler.