/louvain-split-communities-openmp

Design of OpenMP-based Parallel Louvain algorithm for community detection, with no internally-disconnected communities.

Primary LanguageC++MIT LicenseMIT

Design of OpenMP-based Parallel Louvain algorithm for community detection,
with no internally-disconnected communities.


Community detection entails the identification of clusters of vertices that exhibit stronger connections within themselves compared to the wider network. The Louvain method, a commonly utilized heuristic for this task, employs a two-step process comprising a local-moving phase and an aggregation phase. This process iteratively optimizes the modularity metric, a measure of community quality. Although widely used, the Louvain method has been noted for generating internally-disconnected and poorly connected communities. In response, Traag et al. introduce the Leiden algorithm, which incorporates an extra refinement phase between the local-moving and aggregation phases. This refinement step enables vertices to explore and potentially create sub-communities within the identified communities from the local-moving phase. In this report, we propose another BFS-based approach, GSP-Louvain, for addressing the same issue. This is different from a number of earlier works, which tackled internally-disconnected communities as a post-processing step.

Below we plot the time taken by the original Leiden, igraph Leiden, NetworKit Leiden, GSP-Louvain on 13 different graphs. GSP-Louvain surpasses the original Leiden, igraph Leiden, and NetworKit Leiden by 341×, 83×, and 6.1× respectively, achieving a processing rate of 328M edges/s on a 3.8𝐵 edge graph.

Below we plot the speedup of GSP-Louvain wrt original Leiden, igraph Leiden, and NetworKit Leiden.

Next, we compare the modularity of communities identified by the original Leiden algorithm, igraph Leiden, NetworKit Leiden, and GSP-Louvain. On average, GSP-Louvain achieves 0.3% lower modularity than the original Leiden and igraph Leiden, respectively, and 25% higher modularity than NetworKit Leiden, particularly evident on road networks and protein k-mer graphs.

Finally, we plot the fraction of disconnected communities identified by each implementation. Absence of bars indicates the absence of disconnected communities. The original Leiden, igraph Leiden, and GSP-Louvain do not identify any communities that are internally-disconnected. However, on average, NetworKit Leiden exhibit fraction of disconnected communities amounting to 1.5×10^−2, particularly on web graphs and social networks.

Refer to our technical reports for more details:
An Approach for Addressing Internally-Disconnected Communities in Louvain Algorithm.


Note

You can just copy main.sh to your system and run it.
For the code, refer to main.cxx.



Code structure

The code structure of GSP-Louvain is as follows:

- inc/_algorithm.hxx: Algorithm utility functions
- inc/_bitset.hxx: Bitset manipulation functions
- inc/_cmath.hxx: Math functions
- inc/_ctypes.hxx: Data type utility functions
- inc/_cuda.hxx: CUDA utility functions
- inc/_debug.hxx: Debugging macros (LOG, ASSERT, ...)
- inc/_iostream.hxx: Input/output stream functions
- inc/_iterator.hxx: Iterator utility functions
- inc/_main.hxx: Main program header
- inc/_mpi.hxx: MPI (Message Passing Interface) utility functions
- inc/_openmp.hxx: OpenMP utility functions
- inc/_queue.hxx: Queue utility functions
- inc/_random.hxx: Random number generation functions
- inc/_string.hxx: String utility functions
- inc/_utility.hxx: Runtime measurement functions
- inc/_vector.hxx: Vector utility functions
- inc/batch.hxx: Batch update generation functions
- inc/bfs.hxx: Breadth-first search algorithms
- inc/csr.hxx: Compressed Sparse Row (CSR) data structure functions
- inc/dfs.hxx: Depth-first search algorithms
- inc/duplicate.hxx: Graph duplicating functions
- inc/Graph.hxx: Graph data structure functions
- inc/louvian.hxx: Louvian algorithm functions
- inc/louvainSplit.hxx: Louvain with no disconnected communities
- inc/main.hxx: Main header
- inc/mtx.hxx: Graph file reading functions
- inc/properties.hxx: Graph Property functions
- inc/selfLoop.hxx: Graph Self-looping functions
- inc/symmetricize.hxx: Graph Symmetricization functions
- inc/transpose.hxx: Graph transpose functions
- inc/update.hxx: Update functions
- main.cxx: Experimentation code
- process.js: Node.js script for processing output logs

Note that each branch in this repository contains code for a specific experiment. The main branch contains code for the final experiment. If the intention of a branch in unclear, or if you have comments on our technical report, feel free to open an issue.



References




ORG DOI