Show values (sums) in venn diagram

Question

Show values (sums) in venn diagram

williamlai2 opened this issue 4 years ago · 6 comments

It would be good to be able to show values.

Something like this (not sure if I am representing this correctly):

a <- c(1, 3) # A, AB
b <- c(1, 2) # AB, B

venn <- c(a[1], # A
          a[2] + b[1], # AB
          b[2]) # B

> venn
[1] 1 4 2

Answer 1 · 2021-02-02T05:00:00.000Z

Sorry, I am confused about the example. Do the numbers in 'a' and 'b' vectors mean element counts? If so, why a[2] and b[1] are not equal? Could you specify your idea more concretely?

Answer 2 · 2021-02-02T06:34:13.000Z

Thanks for getting back to me. Imagine that they are dollars in groups, but the groups overlap (I am working with custom industry classifications and want to show the output for overlapping groups).

Answer 3 · 2021-02-02T13:13:01.000Z

In the example:

a <- c(1, 3) # A, AB
b <- c(1, 2) # AB, B

Since both a[2] and b[1] are AB, why not code like this:

a <- c(1, 3+1) # A, AB
b <- c(3+1, 2) # AB, B

Could you please provide a real example to explain why AB in two vectors are different?

Answer 4 · 2021-02-02T22:39:30.000Z

Lets say that the numbers are jobs.

df <- structure(list(a = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 468, 0, 0, 0, 1446, 
                           3, 0, 0, 1043, 1593, 0, 0, 0, 742, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 198, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           922, 249, 0, 0, 2060, 93, 0, 605, 274, 24, 161, 417, 122, 3, 
                           1560, 0, 3, 0, 0, 55, 73, 363, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 433, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 576, 34, 0, 0, 0, 0, 22, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1821, 4433, 
                           19062, 0, 0, 0, 0, 0, 0, 873, 89, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
                     b = c(0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 468, 0, 0, 0, 1446, 3, 0, 0, 1043, 1593, 0, 0, 0, 742, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          2137, 198, 284, 1181, 14588, 100, 340, 1558, 211, 6431, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 2022, 30939, 39, 169, 1845, 1811, 6088, 
                          2323, 1241, 1311, 13009, 1617, 6857, 0, 81, 63, 0, 124, 1642, 
                          537, 27404, 237, 1393, 1657, 0, 0, 620, 360, 152, 2922, 922, 
                          249, 410, 295, 2060, 93, 1724, 605, 274, 24, 161, 417, 122, 3, 
                          1560, 312, 3, 1785, 1053, 55, 73, 363, 13912, 1126, 0, 0, 217, 
                          626, 0, 10, 0, 0, 0, 0, 0, 0, 0, 108, 2635, 0, 0, 15, 0, 0, 6, 
                          135, 3, 0, 0, 0, 0, 830, 0, 0, 102, 0, 0, 397, 0, 0, 0, 0, 258, 
                          0, 0, 13, 128, 0, 0, 0, 0, 29, 0, 419, 0, 0, 0, 28, 0, 91, 0, 
                          0, 0, 0, 0, 0, 137, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3158, 0, 0, 0, 0, 0, 0, 0, 
                          0, 2392, 0, 0, 0, 0, 0, 0, 13979, 1821, 4433, 19062, 1282, 7825, 
                          18692, 10279, 902, 1140, 873, 89, 5215, 951, 220, 529, 9144, 
                          712, 4212, 8, 630, 233, 538, 5747, 1780, 11, 7314, 1073, 16007, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 177, 358, 0, 563, 1006, 0, 0, 0, 1848, 
                          0, 281, 0, 1052, 0, 0, 0, 0, 0, 825, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
                     c = c(10623, 25707, 3343, 279, 
                          4007, 5372, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2199, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 168, 0, 0, 0, 102, 749, 85, 3110, 157, 648, 3204, 520, 
                          96, 50, 106, 846, 181, 290, 162, 183, 1, 337, 700, 191, 81, 23, 
                          378, 25, 93, 14, 459, 181, 257, 680, 802, 0, 1349, 10, 419, 306, 
                          1895, 167, 54, 908, 1252, 226, 177, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 12, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0)), 
                row.names = c(NA, -443L), 
                class = c("tbl_df", "tbl", "data.frame"))


library(ggvenn)
x <- list(`A` = df$a,
          `B` = df$b,
          `C` = df$c)

ggvenn(x,
       c("A", "B", "C"),
       show_percentage = FALSE)

The venn diagram shows the intersect of counts, rather than the sum of the jobs. Does that make sense?

Looking at the code in your package, you have a show_elements argument. It would just be the sum of that if the items are numeric.

Answer 5 · 2021-02-03T00:59:44.000Z

Thanks for the code! I understand now.

There are two ways to use ggvenn. One is using list as input, and the other is using data.frame.

In the former case (list), ggvenn treats list elements (x$A, x$B, x$C) as sets. So same values between sets will be counted into intersection. For example:

ggvenn(list(A = c(1,2,3,4), B = c(1,5,6)), show_percentage = FALSE)

Its result is exact the same as:

ggvenn(list(A = c("A","B","C","D"), B = c("A","E","F")), show_percentage = FALSE)

For the same reason, duplicated elements will be removed before plotting:

ggvenn(list(A = c(1,1,1,2,3,4), B = c(1,5,6,6,6)), show_percentage = FALSE)

The output plot is the same.

In your example above, all zeros will be merged as one element before plotting. I guess treating numeric vectors as counts may lead to more confusion. I am not sure if an explicit argument (such as 'number_as_count') could help or not.

In the latter case (input as 'data.frame'), ggvenn so far picks up only logical columns for plotting. Your suggestion of treating numeric values as counts (and counting sum) is more intuitive and indeed a good idea, something like (using 'df' directly, rather than constructing another list 'x'):

ggvenn(df, c("a", "b", "c"))  # pick numeric columns

How do you think?

Answer 6 · 2021-02-03T02:16:11.000Z

Thanks for the explanation. It could be an option like you have mentioned.

With the data, it would be something like this:

a <- df$a
b <- df$b
c <- df$c

A <- sum(as.numeric(setdiff(a, union(b,c))))
B <- sum(as.numeric(setdiff(b, union(a,c))))
C <- sum(as.numeric(setdiff(c, union(a,b))))
AB <- sum(as.numeric(setdiff(intersect(a,b),c)))
AC <- sum(as.numeric(setdiff(intersect(a,c),b)))
BC <- sum(as.numeric(setdiff(intersect(b,c),a)))
ABC <- sum(as.numeric(intersect(intersect(a,b),c)))
sum_ABC <- A + B + C + AB + AC + BC + ABC

Edit: Actually this won't work as it is set difference and duplicates don't count.