vtraag/leidenalg

Disconnected communities with ModularityVertexPartition on very large graphs

wolfram77 opened this issue · 3 comments

With the webbase-2001 graph, a graph with 118M vertices and and 1.02B directed edges, leidenalg is currently returning 30 disconnected communities (with seed 0).

Vincent notes that this problem is likely due to numerical precision. That is, the ModularityVertexPartition is using scaled improvements to modularity (i.e. 1/m, for consistency with the quality function), which for very large graphs may amount to not seeing any difference (i.e. the improvement of separating two disconnected parts may be positive, but due to the enormous weight, this may effectively be (near) 0). Instead, the RBConfigurationVertexPartition uses unscaled improvements to modularity (i.e. they do not scale with the total weight).

We do not observe any disconnected communities with both RBConfigurationVertexPartition (with libleidenalg) and igraph.

A possible fix may be to switch to double for all computation. Another solution might be for ModularityVertexPartition to extend RBConfigurationVertexPartition and simply overload the quality function (so that quality returns modularity, but diff_move is still not scaled).

@vtraag, does this affect the implementation in igraph as well? I'm guessing yes?

Thanks for the report here @wolfram77!

The problem here is that I want to ensure that the diff_move() function is consistent with the difference in the quality() function after the exact same move. Indeed, for the ModularityVertexPartition, the diff_move() is scaled (i.e. with $1/m$ with $m$ the number of edges) in order to be consistent with the quality() function.

What I'll probably do is separate the diff_move function into two separate parts, one diff_move_raw, which can then be used in the optimisation procedure, and one diff_move which is the scaled version of diff_move_raw used for consistency checks.

@vtraag, does this affect the implementation in igraph as well? I'm guessing yes?

No, this does not affect the igraph implementation, it always uses an unscaled version of the difference (i.e. it doesn't divide by $m$).

Hello @vtraag I was thinking if it would be easier to recommend users to use RBConfigurationVertexPartition instead of ModularityVertexPartition for very large graphs - until you see enough requests for it to directly work with ModularityVertexPartition. If so, we can update the documentation and close this issue immediately.