Clustering results are not always the same given the same resolution parameter
prakritipaul opened this issue · 3 comments
Hello,
I am using the leiden algorithm implementation in iGraph, and noticed that when I repeat clustering using the same resolution parameter, I get different results. Here is some small debugging code I wrote to find this. I tracked the number of clusters post-clustering at each step.
# Do the initial clustering
clustering = do_leiden_clustering(G, resolution_parameter=initial_resolution, n_iterations=n_iterations)
best_num_clusters, best_modularity, best_membership = len(clustering), clustering.modularity, clustering.membership
print(f"best_num_clusters = {best_num_clusters}")
num_clusters = best_num_clusters
i = 0
# Continue until there is a difference between current and previous clustering results.
while num_clusters == best_num_clusters:
clustering = do_leiden_clustering(G, resolution_parameter=initial_resolution, n_iterations=n_iterations)
num_clusters, best_modularity, best_membership = len(clustering), clustering.modularity, clustering.membership
print(f"{i} num_clusters = {num_clusters}")
if num_clusters != best_num_clusters:
print(f"{i} End because num_clusters = {num_clusters} != {best_num_clusters}!")
break
i += 1
# Results
best_num_clusters = 2
0 num_clusters = 2
1 num_clusters = 2
2 num_clusters = 2
3 num_clusters = 2
4 num_clusters = 2
5 num_clusters = 1
5 End because num_clusters = 1 != 2!
Because num_clusters changes at some point, this implies that the clustering results are not deterministic.
Why is this the case?
Thanks!
PC
Sorry for the late reply! I only got back to this after my summer holidays ⛺ 🌞 😎 and other travels 🛫.
The Leiden algorithm is stochastic, meaning that every run, it may return different results. These differences depend on the random numbers being generated while running the algorithm. These differences may be relatively minor if there is a single relatively clear community structure, but the differences will be more substantial if there are multiple possible community structures.
If you want to ensure that you always get identical results, you need to set the seed for the random number generator. In igraph
you can set the seed like this:
import random
random.seed(0)
By default, igraph
used the random number generator from random
, and so setting the seed in the random
package leads to identical random numbers being generated also by igraph
. See some more information here: https://igraph.org/python/versions/latest/api/index.html#set_random_number_generator
If you use the leidenalg
package, you can set the seed directly in the package itself. For example, using the seed
argument in the find_partition
function, or if you are using an Optimiser
object to detect communities you can use the set_rng_seed
function.
Note that this issue tracker is only for the leidenalg
package. The issue tracker for igraph
can be found here, and if your question is really about igraph
, not about leidenalg
, you are more likely to get a response there (since I'm the only one following the leidenalg
issue tracker). Alternatively, you could post your question on the igraph
user forum, if it is a question rather than a bug report.
I'm assuming this explanation clarifies the issue, so I'll close it for now. If you do have other questions, let me know though. If necessary, we can always re-open the issue.
I am sorry for my late response, but yes, thank you! This was very helpful.