yanlinlin82/ggvenn

Show values (sums) in venn diagram

williamlai2 opened this issue · 6 comments

It would be good to be able to show values.

Something like this (not sure if I am representing this correctly):

a <- c(1, 3) # A, AB
b <- c(1, 2) # AB, B

venn <- c(a[1], # A
          a[2] + b[1], # AB
          b[2]) # B

> venn
[1] 1 4 2

Sorry, I am confused about the example. Do the numbers in 'a' and 'b' vectors mean element counts? If so, why a[2] and b[1] are not equal? Could you specify your idea more concretely?

Thanks for getting back to me. Imagine that they are dollars in groups, but the groups overlap (I am working with custom industry classifications and want to show the output for overlapping groups).

In the example:

a <- c(1, 3) # A, AB
b <- c(1, 2) # AB, B

Since both a[2] and b[1] are AB, why not code like this:

a <- c(1, 3+1) # A, AB
b <- c(3+1, 2) # AB, B

Could you please provide a real example to explain why AB in two vectors are different?

Lets say that the numbers are jobs.

df <- structure(list(a = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 468, 0, 0, 0, 1446, 
                           3, 0, 0, 1043, 1593, 0, 0, 0, 742, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 198, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           922, 249, 0, 0, 2060, 93, 0, 605, 274, 24, 161, 417, 122, 3, 
                           1560, 0, 3, 0, 0, 55, 73, 363, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 433, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 576, 34, 0, 0, 0, 0, 22, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1821, 4433, 
                           19062, 0, 0, 0, 0, 0, 0, 873, 89, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
                     b = c(0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 468, 0, 0, 0, 1446, 3, 0, 0, 1043, 1593, 0, 0, 0, 742, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          2137, 198, 284, 1181, 14588, 100, 340, 1558, 211, 6431, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 2022, 30939, 39, 169, 1845, 1811, 6088, 
                          2323, 1241, 1311, 13009, 1617, 6857, 0, 81, 63, 0, 124, 1642, 
                          537, 27404, 237, 1393, 1657, 0, 0, 620, 360, 152, 2922, 922, 
                          249, 410, 295, 2060, 93, 1724, 605, 274, 24, 161, 417, 122, 3, 
                          1560, 312, 3, 1785, 1053, 55, 73, 363, 13912, 1126, 0, 0, 217, 
                          626, 0, 10, 0, 0, 0, 0, 0, 0, 0, 108, 2635, 0, 0, 15, 0, 0, 6, 
                          135, 3, 0, 0, 0, 0, 830, 0, 0, 102, 0, 0, 397, 0, 0, 0, 0, 258, 
                          0, 0, 13, 128, 0, 0, 0, 0, 29, 0, 419, 0, 0, 0, 28, 0, 91, 0, 
                          0, 0, 0, 0, 0, 137, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3158, 0, 0, 0, 0, 0, 0, 0, 
                          0, 2392, 0, 0, 0, 0, 0, 0, 13979, 1821, 4433, 19062, 1282, 7825, 
                          18692, 10279, 902, 1140, 873, 89, 5215, 951, 220, 529, 9144, 
                          712, 4212, 8, 630, 233, 538, 5747, 1780, 11, 7314, 1073, 16007, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 177, 358, 0, 563, 1006, 0, 0, 0, 1848, 
                          0, 281, 0, 1052, 0, 0, 0, 0, 0, 825, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
                     c = c(10623, 25707, 3343, 279, 
                          4007, 5372, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2199, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 168, 0, 0, 0, 102, 749, 85, 3110, 157, 648, 3204, 520, 
                          96, 50, 106, 846, 181, 290, 162, 183, 1, 337, 700, 191, 81, 23, 
                          378, 25, 93, 14, 459, 181, 257, 680, 802, 0, 1349, 10, 419, 306, 
                          1895, 167, 54, 908, 1252, 226, 177, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 12, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0)), 
                row.names = c(NA, -443L), 
                class = c("tbl_df", "tbl", "data.frame"))


library(ggvenn)
x <- list(`A` = df$a,
          `B` = df$b,
          `C` = df$c)

ggvenn(x,
       c("A", "B", "C"),
       show_percentage = FALSE)  

The venn diagram shows the intersect of counts, rather than the sum of the jobs. Does that make sense?

image

Looking at the code in your package, you have a show_elements argument. It would just be the sum of that if the items are numeric.

Thanks for the code! I understand now.

There are two ways to use ggvenn. One is using list as input, and the other is using data.frame.

In the former case (list), ggvenn treats list elements (x$A, x$B, x$C) as sets. So same values between sets will be counted into intersection. For example:

ggvenn(list(A = c(1,2,3,4), B = c(1,5,6)), show_percentage = FALSE)

Its result is exact the same as:

ggvenn(list(A = c("A","B","C","D"), B = c("A","E","F")), show_percentage = FALSE)

For the same reason, duplicated elements will be removed before plotting:

ggvenn(list(A = c(1,1,1,2,3,4), B = c(1,5,6,6,6)), show_percentage = FALSE) 

The output plot is the same.

In your example above, all zeros will be merged as one element before plotting. I guess treating numeric vectors as counts may lead to more confusion. I am not sure if an explicit argument (such as 'number_as_count') could help or not.

In the latter case (input as 'data.frame'), ggvenn so far picks up only logical columns for plotting. Your suggestion of treating numeric values as counts (and counting sum) is more intuitive and indeed a good idea, something like (using 'df' directly, rather than constructing another list 'x'):

ggvenn(df, c("a", "b", "c"))  # pick numeric columns

How do you think?

Thanks for the explanation. It could be an option like you have mentioned.

With the data, it would be something like this:

a <- df$a
b <- df$b
c <- df$c

A <- sum(as.numeric(setdiff(a, union(b,c))))
B <- sum(as.numeric(setdiff(b, union(a,c))))
C <- sum(as.numeric(setdiff(c, union(a,b))))
AB <- sum(as.numeric(setdiff(intersect(a,b),c)))
AC <- sum(as.numeric(setdiff(intersect(a,c),b)))
BC <- sum(as.numeric(setdiff(intersect(b,c),a)))
ABC <- sum(as.numeric(intersect(intersect(a,b),c)))
sum_ABC <- A + B + C + AB + AC + BC + ABC

Edit: Actually this won't work as it is set difference and duplicates don't count.