very slow, Doesn't work with large datasets
Opened this issue · 6 comments
library(eulerr)
fit <- euler(c( "A" = 55, "B" = 810, "C" = 102, "D" = 364, "E" = 101, "F" = 24,
"G" = 34, "H" = 61, "I" = 194, "J" = 107, "K" = 53, "L" = 75, "M" = 11,
"N" = 65, "O" = 16, "P" = 82, "Q" = 13, "F&O" = 5, "G&O" = 5, "F&G" = 5,
"D&I" = 47, "D&E" = 33, "K&L" = 9, "A&K" = 7, "K&N" = 7, "A&N" = 7, "A&L" = 7,
"K&P" = 7, "A&P" = 7, "L&N" = 7, "N&P" = 7, "C&K" = 7, "A&C" = 7, "L&P" = 7,
"E&G" = 6, "J&K" = 7, "A&J" = 7, "C&O" = 5, "G&H" = 4, "C&N" = 7, "J&N" = 7,
"E&F" = 5, "C&F" = 5, "C&L" = 7, "J&L" = 7, "C&P" = 7, "J&P" = 7, "C&G" = 5,
"D&K" = 15, "C&J" = 7, "A&D" = 12, "D&L" = 12, "E&O" = 3, "E&H" = 4, "C&I" = 6,
"C&D" = 8, "D&H" = 7, "D&N" = 7, "D&P" = 7, "D&J" = 7, "C&E" = 3, "H&O" = 1,
"B&D" = 15, "B&C" = 11, "F&H" = 1, "E&M" = 1, "B&E" = 8, "B&K" = 7, "I&K" = 2,
"A&B" = 7, "B&N" = 7, "B&L" = 7, "B&P" = 7, "D&F" = 3, "B&J" = 7, "I&L" = 2,
"C&H" = 1, "D&O" = 1, "D&G" = 1),
shape = "ellipse")
# ^^^^^^^ takes forever
fit$stress
fit$diagError
plot(fit)
Meanwhile the browser Venn.js is instant, just less accurate.
Can there be some technique applied to speed it up or at least specify a stopping point for accuracy?
Thanks
Hm, yes I agree that there should be an option to set tolerance and maximum number of iterations.
It looks like an easy fix too. I'm happy to consider pull requests if you or anyone else wants to contribute.
I was going to parametrise it but I noticed my N
is 131071
.
its stuck at getting stats:
stats::nlm(f = optim_final_loss,
p = pars,
areas = areas_disjoint,
circle = circle,
iterlim = 1e6)$estimate
Even with itterlim to 1 its still taking hours.
Either the algorithm is slow or there is a bug.
Also I don't see how my N is so large with just 28 nodes.
28 is not a small number, and I'm not actually sure that it makes sense to even try to use an Euler diagram here since the output is bound to be widely misleading. Finding good approximate Euler diagrams is hard even at 4 sets.
The problem is that eulerr tries to compute the areas of the 2^28 - 1 = 268435455 possible intersections in the diagram, which is obviously not going to end well.
It sounds like venn.js is doing something differently since it's as you say much faster, but I'm not really sure how it's possible to avoid considering all the intersections.
Or, hm, actually it should absolutely be possible to alleviate this problem by ruling out some of the possible intersections, i.e. if A is not intersecting with C, then obviously A&C&whatever will also be empty, so yeah, this should be possible to fix (at least partially).
Agreed. Its not the amount of nodes that matter, its the overlaps.
I was thinking to do something like start off with just one overlap and keep increasing the overlaps until I find the outlier overlap that's causing the algorithm to stress over finding a solution.
I'm having a similar issue, any solution?