Feature request:UpsetR: extracting intersections elements
robertamig opened this issue · 2 comments
Hello! Would it be possible to add a feature to extract the lists of elements making up the intersections, or also the elements unique to each group when plotting an UpsetR plot?
here is my solution. It is bio-related.
I start with a list of vectors. In my case a list of genes belonging to different signatures.
str(list_filter)
List of 9
$ CellAge_Induces : chr [1:153] "AAK1" "ABI3" "ADCK5" "AGT" ...
$ CellAge_Inhibits : chr [1:121] "ACLY" "AKR1B1" "ASPH" "ATF7IP" ...
$ CLASSICAL_SASP : chr [1:38] "BGN" "CCL2" "CCL20" "COL1A1" ...
$ FRIDMAN_SENESCENCE_UP : chr [1:77] "ALDH1A3" "CCND1" "CD44" "CDKN1A" ...
$ ISM_SCORE : chr [1:128] "HSH2D" "OTOF" "TRIM69" "PSME1" ...
$ MOSERLE_IFNA_RESPONSE : chr [1:31] "CD274" "CMPK2" "CXCL10" "DDX58" ...
$ REACTOME_SENESCENCE_SASP: chr [1:110] "ANAPC1" "ANAPC10" "ANAPC11" "ANAPC15" ...
$ SAEPHIA_CURATED_SASP : chr [1:38] "IL1A" "IL1B" "CXCL10" "CXCL1" ...
$ senmayo : chr [1:125] "ACVR1B" "ANG" "ANGPT1" "ANGPTL4" ...
From this list, I generate two tables:
One with the unique gene names
df2 <- data.frame(gene=unique(unlist(list_filter)))
head(df2)
gene
1 AAK1
2 ABI3
3 ADCK5
4 AGT
5 AKT1
6 ALOX15B
dim(df2)
[1] 671 1
One is simply a "dataframe" version of the list. With every gene in the signature and the name of every signature.
df1 <- lapply(list_filter,function(x){
data.frame(gene = x)
}) %>%
bind_rows(.id = "path")
head(df1)
path gene
1 CellAge_Induces AAK1
2 CellAge_Induces ABI3
3 CellAge_Induces ADCK5
4 CellAge_Induces AGT
5 CellAge_Induces AKT1
6 CellAge_Induces ALOX15B
dim(df1)
[1] 821 2
now I iterate the search of each unique gene name and save the identity of the signatures in a column.
df_int <- lapply(df2$gene,function(x){
# pull the name of the intersections
intersection <- df1 %>%
dplyr::filter(gene==x) %>%
arrange(path) %>%
pull("path") %>%
paste0(collapse = "|")
# build the dataframe
data.frame(gene = x,int = intersection)
}) %>%
bind_rows()
head(df_int,n=20)
gene int
1 AAK1 CellAge_Induces
2 ABI3 CellAge_Induces
3 ADCK5 CellAge_Induces
4 AGT CellAge_Induces
5 AKT1 CellAge_Induces
6 ALOX15B CellAge_Induces
7 AR CellAge_Induces
8 ARPC1B CellAge_Induces
9 ASF1A CellAge_Induces
10 AXL CellAge_Induces|senmayo
11 BHLHE40 CellAge_Induces
12 BLK CellAge_Induces
13 BRAF CellAge_Induces
14 BRD7 CellAge_Induces
15 CAV1 CellAge_Induces
16 CCND1 CellAge_Induces|FRIDMAN_SENESCENCE_UP
17 CDK18 CellAge_Induces
18 CDKN1A CellAge_Induces|FRIDMAN_SENESCENCE_UP|REACTOME_SENESCENCE_SASP
19 CDKN1C CellAge_Induces|FRIDMAN_SENESCENCE_UP
20 CDKN1B CellAge_Induces|REACTOME_SENESCENCE_SASP
dim(df_int)
[1] 671 2
the dataframe can be summarised and compared to the output provided by calling
df_int %>%
group_by(int) %>%
summarise(n=n()) %>%
arrange(desc(n))
# A tibble: 47 × 2
int n
<chr> <int>
1 CellAge_Induces 126
2 CellAge_Inhibits 110
3 REACTOME_SENESCENCE_SASP 95
4 ISM_SCORE 93
5 senmayo 77
6 FRIDMAN_SENESCENCE_UP 44
7 ISM_SCORE|MOSERLE_IFNA_RESPONSE 27
8 CLASSICAL_SASP|senmayo 12
9 CLASSICAL_SASP 8
10 SAEPHIA_CURATED_SASP 8
# … with 37 more rows
# ℹ Use `print(n = ...)` to see more rows
upset(fromList(list_filter),nsets = 10)
ComplexHeatmap has a function for this: extract_comb()
https://jokergoo.github.io/ComplexHeatmap-reference/book/upset-plot.html#upset-utility-functions