how to query other GSEA databases (for mouse)
Closed this issue ยท 7 comments
Hi Jack,
SCPA is a great tool for pathway analysis for my Seurat_Objects.
While I can query the Hallmark pathways in the mouse with...
pathways <- msigdbr("Mus musculus", "H") %>%
format_pathways()
Just need some clarification on how to query other databases with the msigdbr()?
First, the above line of code does pull from mouse GSEA databases, right? Below screenshot shows the 'pathways' object which I assume is mouse based on how the genes are written "Adam10"..
Second, how do I query 'M5' or 'M8' from GSEA for mouse?
Can the code below work for querying the ontology gene sets?
pathways2 <- msigdbr(species = "Mus musculus", category = "C5") %>%
format_pathways()
Thx again, enjoying this R-package
-Bart
Hi Bart,
Glad it's useful. If you're looking at mouse specific pathways, it's probably better to download the gmt files from the MSigDB site here, and then just specify the gmt file path as your input to SCPA. There's some information on how to use gmt files with SCPA here.
Some nuances about msigdbr... When you pull mouse gene sets from msigdbr, it's not actually the same as what's on the mouse MSigDB site, as the msigdbr package was built on an earlier version of MSigDB before the mouse gene sets were released (see this from the msigdbr documentation). I think msigdbr is actually just using the same pathways and converting gene ids from human to mouse (and other species) internally, which is different than what's on the MSigDB site.
So e.g.
mouse_c5 <- msigdbr(species = "Mus musculus", category = "C5")
This is not equivalent to what's on the mouse MSigDB site. It's just the human C5 genes converted to mouse format. There will be crossover, but also differences.
Jack
thanks for clarification, so evidently I was just looking at human genes converted to mouse format, good to know
Following your response, I downloaded the gmt files for mouse and tried to load it into R's environment but the object is 'empty' (see screenshot)
also, can i follow the same code for Seurat Object or do I have to use 'compare_pathways()' ?
my goal is to compare two cell types in my integrated Seurat Object
best,
Bart
You just need to specify the parent directory because the list.files()
is going to find the .gmt file e.g.
gmt_files <- list.files(path = "/Users/bartbryant/Desktop/SCPA", pattern = "gmt", full.names = T)
scpa_out <- compare_seurat(lmod1, pathways = gmt_files... etc.)
This option is best if you have multiple gmt files. If you just have one gmt file, you can just specify the file path to the gmt file itself e.g.
gmt_file <- "/Users/bartbryant/.../mh.all_rest_of_the_filename_.gmt"
scpa_out <- compare_seurat(lmod1, pathways = pathways... etc.)
and yup, you can just use the usual compare_seurat()
with this as well
thx, that covered it
additional question
how do I find the mouse genes associated with particular pathways?
what code to do this option?
I did the code below following 'issue' #38
pathways <- msigdbr("Mus musculus", "H") %>%
format_pathways()
names(pathways) <- sapply(pathways, function(x) x$Pathway[1]) # just to name the list, so easier to visualise
pathways$HALLMARK_MYOGENESIS$Genes
will the above code work pulling mouse genes over human genes?
Yup, this will work for mouse genes
SCPA has a function called get_paths() that can generate gene lists from a gmt file, so you could use this...
pathways <- get_paths("~/Downloads/m2.cp.v2023.1.Mm.symbols.gmt")
names(pathways) <- sapply(pathways, function(x) x$Pathway[1])
pathways$REACTOME_INTERLEUKIN_2_FAMILY_SIGNALING