INTRODUCTION
This package implements the louvain algorithm in C++
and exposes it to python
.
It relies on (python-)igraph
for it to function. Besides the relative
flexibility of the implementation, it also scales well, and can be run on graphs
of millions of nodes (as long as they can fit in memory). The core function is
find_partition
which finds the optimal partition using the louvain algorithm
for a number of different methods. The methods currently implemented are:
-
Modularity. This method compares the actual graph to the expected graph, taking into account the degree of the nodes [1]. The expected graph is based on a configuration null-model. Notice that we use the non-normalized version (i.e. we don't divide by the number of edges), so that this Modularity values generally does not fall between 0 and 1. The formal definition is
H = sum_ij (A_ij - k_i k_j / 2m) d(s_i, s_j),
where
A_ij = 1
if there is an edge between nodei
andj
,k_i
is the degree of nodei
ands_i
is the community of node i. -
RBConfiguration. This is an extension of modularity which includes a resolution parameter [2]. In general, a higher resolution parameter will lead to smaller communities. The formal definition is
H = sum_ij (A_ij - gamma k_i k_j / 2m) d(s_i, s_j),
where
gamma
is the resolution value, and the other variables are the same as for Modularity. -
RBER. A variant of the previous method that instead of a configuration null-model uses a Erdös-Rényi null-model in which each edge has the same probability of appearing [2]. The formal definition is
H = sum_ij (A_ij - gamma p) d(s_i, s_j),
where
p
is the density of the graph, and the other variables are the same as for Modularity, withgamma
a resolution parameter. -
CPM. This method compares to a fixed resolution parameter, so that it finds communities that have an internal density higher than the resolution parameter, and is separated from other communities with a density lower than the resolution parameter [3].The formal definition is
H = sum_ij (A_ij - gamma ) d(s_i, s_j),
with
gamma
a resolution parameter, and the other variables are the same as for Modularity. -
Significance. This is a probabilistic method based on the idea of assessing the probability of finding such dense subgraphs in an (ER) random graph [4]. The formal definition is
H = sum_c M_c D(p_c || p)
where
M_c
is the number of possible edges in communityc
, i.e.n_c (n_c - 1)/2
for undirected graphs and twice that for directed grahs withn_c
the size of communityc
,p_c
is the density of the communityc
, andp
the general density of the graph, andD(x || y)
is the binary Kullback-Leibler divergence. -
Surprise. Another probabilistic method, but rather than the probability of finding dense subgraphs, it focuses on the probability of so many edges within communities [5, 6]. The formal definition is
H = m D(q || <q>)
where
m
is the number of edges,q
is the proportion of edges within communities (i.e.sum_c m_c / m
) and<q>
is the expected proportion of edges within communities in an Erdős–Rényi graph.
INSTALLATION
In short, for Unix: sudo pip install louvain
.
For Windows: download the binary installers.
For Unix like systems it is possible to install from source. For Windows this is overly complicated, and you are recommended to use the binary installation files. There are two things that are needed by this package: the igraph c core library and the python-igraph python package. For both, please see http://igraph.org.
There are basically two installation modes, similar to the python-igraph package itself (from which most of the setup.py comes).
- No C core library is installed yet. The packages will be compiled and linked statically to an automatically downloaded version of the C core library of igraph.
- A C core library is already installed. In this case, the package will link dynamically to the already installed version. This is probably also the version that is used by the igraph package, but you may want to double check this.
In case the python-igraph package is already installed before, make sure that both use the same versions.
The cleanest setup it to install and compile the C core library yourself (make sure that the header files are also included, e.g. install also the development package from igraph). Then both the python-igraph package, as well as this package are compiled and (dynamically) linked to the same C core library.
TROUBLESHOOTING
In case of any problems, best to start over with a clean environment. Make sure
you remove the python-igraph package completely, remove the C core library and
remove the louvain package. Then, do a complete reinstall starting from pip install louvain
. In case you want a dynamic library be sure to then install
the C core library from source before. Make sure you install the same
versions.
USAGE
There is no standalone version of louvain-igraph, and you will always need python to access it. There are no plans for developing a standalone version or R support. So, use python. Please refer to the documentation within the python package for more details on function calls and parameters.
To start, make sure to import the packages:
import louvain
import igraph as ig
We'll create a random graph for testing purposes:
G = ig.Graph.Erdos_Renyi(100, 0.1);
For simply finding a partition use:
part = louvain.find_partition(G, method='Modularity');
In case you want to use a weighted graph, you can store this in an edge attribute:
G.es['weight'] = 1.0;
part = louvain.find_partition(G, method='Modularity', weight='weight');
Please note that not all methods are necessarily capable of handling weighted graphs.
Notice that part
now contains an additional variable, part.quality
which
stores the quality of the partition as calculated by the used method. You can
always get the quality of the partition using another method by calling
part.significance = louvain.quality(G, partition, method='Significance');
You can also find partition for multiplex graphs. For each layer you then
specify the objective function, and the overall objective function is simply the
sum over all layers, weighted by some weight. If we denote by q_k
the quality
of layer k
and the weight by w_k
, the overall quality is then q = sum_k w_k q_k
. This can also be useful in case you have negative links. In
principle, this could also be used to detect temporal communities in a dynamic
setting, cf. [8].
For example, assuming you have a graph with positive weights G_positive
and
a graph with negative weights G_negative
, and you want to use Modularity for
finding a partition, you can use
membership, quality = louvain.find_partition_multiplex([
louvain.Layer(graph=G_positive, method='Modularity', layer_weight=1.0),
louvain.Layer(graph=G_negative, method='Modularity', layer_weight=-1.0)])
Notice the negative layer weight is -1.0
for the negative graph, since we
want those edges to fall between communities rather than within. One particular
problem when using negative links, is that the optimal community is no longer
guaranteed to be connected (it may be a multipartite partition). You may
therefore need the options consider_comms=ALL_COMMS
to improve the quality of
the partition. Notice that this runs much slower than only considering
neighbouring communities (which is the default).
Various methods (such as Reichardt and Bornholdt's Potts model, or CPM) support a (linear) resolution parameter, which can be effectively bisected, cf. [5]. You can do this by calling:
res_parts = louvain.bisect(G, method='CPM', resolution_range=[0,1]);
Notice this may take some time to run, as it effectively calls
louvain.find_partition
for various resolution parameters (depending on the
settings possibly hundreds of times).
Then res_parts
is a dictionary containing as keys the resolution, and as
values a NamedTuple
with variables partition
and bisect_value
, which
contains the partition and the value at which the resolution was bisected (the
value of the bisect_func
of the bisect
function). You could for example plot
the bisection value of all the found partitions by using:
import pandas as pd
import matplotlib.pyplot as plt
res_df = pd.DataFrame({
'resolution': res_parts.keys(),
'bisect_value': [bisect.bisect_value for bisect in res_parts.values()]});
plt.step(res_df['resolution'], res_df['bisect_value']);
plt.xscale('log');
REFERENCES
Please cite the references appropriately in case they are used.
- Blondel, V. D., Guillaume, J.-L., Lambiotte, R. & Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. 2008, P10008 (2008).
- Newman, M. & Girvan, M. Finding and evaluating community structure in networks. Physical Review E 69, 026113 (2004).
- Reichardt, J. & Bornholdt, S. Partitioning and modularity of graphs with arbitrary degree distribution. Physical Review E 76, 015102 (2007).
- Traag, V. A., Van Dooren, P. & Nesterov, Y. Narrow scope for resolution-limit-free community detection. Physical Review E 84, 016114 (2011).
- Traag, V. A., Krings, G. & Van Dooren, P. Significant scales in community structure. Scientific Reports 3, 2930 (2013).
- Aldecoa, R. & Marín, I. Surprise maximization reveals the community structure of complex networks. Scientific reports 3, 1060 (2013).
- Traag, V.A., Aldecoa, R. & Delvenne, J.-C. Detecting communities using Asymptotical Surprise. Forthcoming (2015).
- Mucha, P. J., Richardson, T., Macon, K., Porter, M. A. & Onnela, J.-P. Community structure in time-dependent, multiscale, and multiplex networks. Science 328, 876–8 (2010).