jolars/eulerr

very slow, Doesn't work with large datasets

Opened this issue · 6 comments

library(eulerr)


fit <- euler(c( "A" = 55, "B" = 810, "C" = 102, "D" = 364, "E" = 101, "F" = 24,
            "G" = 34, "H" = 61, "I" = 194, "J" = 107, "K" = 53, "L" = 75, "M" = 11,
            "N" = 65, "O" = 16, "P" = 82, "Q" = 13, "F&O" = 5, "G&O" = 5, "F&G" = 5,
            "D&I" = 47, "D&E" = 33, "K&L" = 9, "A&K" = 7, "K&N" = 7, "A&N" = 7, "A&L" = 7,
            "K&P" = 7, "A&P" = 7, "L&N" = 7, "N&P" = 7, "C&K" = 7, "A&C" = 7, "L&P" = 7,
            "E&G" = 6, "J&K" = 7, "A&J" = 7, "C&O" = 5, "G&H" = 4, "C&N" = 7, "J&N" = 7,
            "E&F" = 5, "C&F" = 5, "C&L" = 7, "J&L" = 7, "C&P" = 7, "J&P" = 7, "C&G" = 5,
            "D&K" = 15, "C&J" = 7, "A&D" = 12, "D&L" = 12, "E&O" = 3, "E&H" = 4, "C&I" = 6,
            "C&D" = 8, "D&H" = 7, "D&N" = 7, "D&P" = 7, "D&J" = 7, "C&E" = 3, "H&O" = 1,
            "B&D" = 15, "B&C" = 11, "F&H" = 1, "E&M" = 1, "B&E" = 8, "B&K" = 7, "I&K" = 2,
            "A&B" = 7, "B&N" = 7, "B&L" = 7, "B&P" = 7, "D&F" = 3, "B&J" = 7, "I&L" = 2,
            "C&H" = 1, "D&O" = 1, "D&G" = 1),
        shape = "ellipse")

# ^^^^^^^ takes forever
fit$stress
fit$diagError

plot(fit)

Meanwhile the browser Venn.js is instant, just less accurate.

Can there be some technique applied to speed it up or at least specify a stopping point for accuracy?

Thanks

Hm, yes I agree that there should be an option to set tolerance and maximum number of iterations.

It looks like an easy fix too. I'm happy to consider pull requests if you or anyone else wants to contribute.

I was going to parametrise it but I noticed my N is 131071.
its stuck at getting stats:

stats::nlm(f = optim_final_loss,
                                 p = pars,
                                 areas = areas_disjoint,
                                 circle = circle,
                                 iterlim = 1e6)$estimate

Even with itterlim to 1 its still taking hours.

Either the algorithm is slow or there is a bug.
Also I don't see how my N is so large with just 28 nodes.

28 is not a small number, and I'm not actually sure that it makes sense to even try to use an Euler diagram here since the output is bound to be widely misleading. Finding good approximate Euler diagrams is hard even at 4 sets.

The problem is that eulerr tries to compute the areas of the 2^28 - 1 = 268435455 possible intersections in the diagram, which is obviously not going to end well.

It sounds like venn.js is doing something differently since it's as you say much faster, but I'm not really sure how it's possible to avoid considering all the intersections.

Or, hm, actually it should absolutely be possible to alleviate this problem by ruling out some of the possible intersections, i.e. if A is not intersecting with C, then obviously A&C&whatever will also be empty, so yeah, this should be possible to fix (at least partially).

Agreed. Its not the amount of nodes that matter, its the overlaps.

I was thinking to do something like start off with just one overlap and keep increasing the overlaps until I find the outlier overlap that's causing the algorithm to stress over finding a solution.

I'm having a similar issue, any solution?