results of leiden.community not reproducible across OS
Closed this issue · 10 comments
Hi!
I have run leidenAlg::leiden.community()
on the exact same graph g
and with identical seeds on Windows and on Linux and the results differ. Is there a known reason (and maybe even fix) for this?
Cheers,
Marie
Code:
set.seed(168575)
partition <- leidenAlg::leiden.community(graph = g, n.iterations = 50)
Expected behaviour: partition is the same when running code on Windows and Linux.
Observed behaviour: partition is different.
Could you try installing the package from this branch? https://github.com/kharchenkolab/leidenAlg/tree/no_cpp
Please check if the problem remains
Are you able to see reproducible clusters across OS using igraph::cluster_leiden()? If so, we could try using this.
Hi, sorry for the delayed response.
I will attach a file that holds the edges and weights for a graph where I observed this problem.
And here is the code I used to read the file, build an igraph and cluster the graph, yielding different clusters on Windows vs. Linux (also for the igraph::cluster_leiden()
):
# load edgelist with weights:
df <- read.csv('graph.csv')
# create igraph from data frame:
g <- igraph::graph_from_data_frame(df)
# seed for reproducibility
set.seed(168575)
# Version 1: leidenAlg
partition <- leidenAlg::leiden.community(graph = g, n.iterations = 50)
# Version 2: igraph. But also this yields different solutions.
# partition <- igraph::cluster_leiden(graph = igraph::as.undirected(g, mode = 'collapse'), n_iterations = 50, objective_function='modularity')
# get cluster frequencies for easier comparisons between results on Linux and Windows.
clusters_df <- base::data.frame(cluster = base::as.numeric(partition$membership), gene = partition$names)
cluster_freqs <- data.frame(table(clusters_df$cluster))
Yes, this is related to #10
If this is an issue for igraph::cluster_leiden()
, it's probably best to create an issue here: https://github.com/igraph/rigraph
@vtraag is on holiday now and he is the one who knows the Leiden algorithm inside and out, but I've taken a cursory glance at the source code in the meantime and made some tests. I can indeed see some nondeterminism for certain graphs and I'll try to find the cause in the next few days. Scratch that, I just forgot to reset the random seed after invoking the algorithm. When I reset the seed, the results seem to be consistent (deterministic) when I am on the same platform. I'll try to test it across different platforms now.
Thanks for the help, @ntamas
(Apologies for being so difficult in recent months as well)
@MarieOestreich So, unfortunately I cannot reproduce this in my environment. I tried your graph.csv
file with the following code, which is only slightly modified from what you posted, mostly to test whether the results are consistent on a single platform when re-initializing the seed (and note that I am calling igraph's cluster_leiden
):
library(igraph)
# load edgelist with weights:
df <- read.csv('graph.csv')
# create igraph from data frame:
g <- graph_from_data_frame(df)
# seed for reproducibility
set.seed(168575)
# Version 1: leidenAlg
# partition <- leidenAlg::leiden.community(graph = g, n.iterations = 50)
# Version 2: igraph. But also this yields different solutions.
p1 <- cluster_leiden(graph = igraph::as.undirected(g, mode = 'collapse'), n_iterations = 50, objective_function='modularity')
set.seed(168575)
p2 <- cluster_leiden(graph = igraph::as.undirected(g, mode = 'collapse'), n_iterations = 50, objective_function='modularity')
compare(p1, p2)
clusters_df <- base::data.frame(cluster = base::as.numeric(p1$membership), gene = p1$names)
cluster_freqs <- data.frame(table(clusters_df$cluster))
print(cluster_freqs)
clusters_df <- base::data.frame(cluster = base::as.numeric(p2$membership), gene = p2$names)
cluster_freqs <- data.frame(table(clusters_df$cluster))
print(cluster_freqs)
I observe the same result in the following environments:
- R 4.2.0 with igraph 1.3.4 on Windows (64-bit Intel CPU)
- R 4.2.0 with igraph 1.3.4 on macOS Monterey (Apple M1)
- R 4.1.2 with igraph 1.3.4 on Ubuntu Linux 20.04 (64-bit Intel CPU)
The summary of the partition is this:
Var1 Freq
1 1 764
2 2 291
3 3 730
4 4 1290
5 5 287
6 6 1360
7 7 214
8 8 439
9 9 791
10 10 339
11 11 279
12 12 977
13 13 83
14 14 86
15 15 4
16 16 201
17 17 43
18 18 85
19 19 74
20 20 10
21 21 3
22 22 4
Can you let me know the exact R and igraph version that you are using on both platforms and whether it is the official CRAN R or some other R distribution (Anaconda R, Microsoft R Open etc)?
Thanks for covering in my absense @ntamas! In addition to the tests for igraph
that @ntamas performed, I can also confirm that the results of the leidenalg
implementation in Python also yields identical results on both Linux and Windows (both for this graph and for random graphs).
Given that a seed is also set here in this R interface
Line 70 in e0eeef6
I would assume that the R interface would also yield identical results for both Linux and Windows, whenever a seed it set in R.
Without being able to reproduce the issue it seems difficult to track it down.