jackbibby1/SCPA

how to query other GSEA databases (for mouse)

Closed this issue ยท 7 comments

Hi Jack,
SCPA is a great tool for pathway analysis for my Seurat_Objects.
While I can query the Hallmark pathways in the mouse with...

pathways <- msigdbr("Mus musculus", "H") %>%
format_pathways()

Just need some clarification on how to query other databases with the msigdbr()?

First, the above line of code does pull from mouse GSEA databases, right? Below screenshot shows the 'pathways' object which I assume is mouse based on how the genes are written "Adam10"..
Screenshot 2023-08-07 at 2 38 44 PM

Second, how do I query 'M5' or 'M8' from GSEA for mouse?
Screenshot 2023-08-07 at 2 42 40 PM

Can the code below work for querying the ontology gene sets?

pathways2 <- msigdbr(species = "Mus musculus", category = "C5") %>%
format_pathways()

Thx again, enjoying this R-package
-Bart

Hi Bart,

Glad it's useful. If you're looking at mouse specific pathways, it's probably better to download the gmt files from the MSigDB site here, and then just specify the gmt file path as your input to SCPA. There's some information on how to use gmt files with SCPA here.

Some nuances about msigdbr... When you pull mouse gene sets from msigdbr, it's not actually the same as what's on the mouse MSigDB site, as the msigdbr package was built on an earlier version of MSigDB before the mouse gene sets were released (see this from the msigdbr documentation). I think msigdbr is actually just using the same pathways and converting gene ids from human to mouse (and other species) internally, which is different than what's on the MSigDB site.

So e.g.

mouse_c5 <- msigdbr(species = "Mus musculus", category = "C5")

This is not equivalent to what's on the mouse MSigDB site. It's just the human C5 genes converted to mouse format. There will be crossover, but also differences.

Jack

thanks for clarification, so evidently I was just looking at human genes converted to mouse format, good to know

Following your response, I downloaded the gmt files for mouse and tried to load it into R's environment but the object is 'empty' (see screenshot)

also, can i follow the same code for Seurat Object or do I have to use 'compare_pathways()' ?

Screenshot 2023-08-07 at 4 33 50 PM

my goal is to compare two cell types in my integrated Seurat Object
best,
Bart

You just need to specify the parent directory because the list.files() is going to find the .gmt file e.g.

gmt_files <- list.files(path = "/Users/bartbryant/Desktop/SCPA", pattern = "gmt", full.names = T)
scpa_out <- compare_seurat(lmod1, pathways = gmt_files... etc.)

This option is best if you have multiple gmt files. If you just have one gmt file, you can just specify the file path to the gmt file itself e.g.

gmt_file <- "/Users/bartbryant/.../mh.all_rest_of_the_filename_.gmt"
scpa_out <- compare_seurat(lmod1, pathways = pathways... etc.)

and yup, you can just use the usual compare_seurat() with this as well

thx, that covered it

additional question
how do I find the mouse genes associated with particular pathways?
what code to do this option?
I did the code below following 'issue' #38
pathways <- msigdbr("Mus musculus", "H") %>%
format_pathways()
names(pathways) <- sapply(pathways, function(x) x$Pathway[1]) # just to name the list, so easier to visualise
pathways$HALLMARK_MYOGENESIS$Genes

will the above code work pulling mouse genes over human genes?

Yup, this will work for mouse genes

sorry for excess queries,
what about obtaining gene lists from other mouse databases?
e.g.
I downloaded gmt files for 'canonical pathways', how to modify the above code to obtain these gene lists?
Screenshot 2023-08-08 at 10 27 05 AM

or
can one get the gene lists associated with the specific pathways from GSEA site?

SCPA has a function called get_paths() that can generate gene lists from a gmt file, so you could use this...

pathways <- get_paths("~/Downloads/m2.cp.v2023.1.Mm.symbols.gmt")
names(pathways) <- sapply(pathways, function(x) x$Pathway[1])
pathways$REACTOME_INTERLEUKIN_2_FAMILY_SIGNALING