RNA-seq: Pathway Analysis - GSEA
Closed this issue · 7 comments
What are the goals of this new example analysis?
We have a GSEA example for microarray, but should create one for RNA-seq.
What kind of dataset will this need?
We just need ranked gene results from RNA-seq.
What steps should be included in this analysis?
I think most of the GSEA steps from microarray can stay the same. We can import the results fromdifferential-expression_rnaseq_01.Rmd
's results from a URL. But, this won't be much different from differential-expression_rnaseq_01.Rmd so we should discuss alternative strategies we might want to show here that we didn't show there.
What packages/methods do you recommend using or looking into for this analysis?
Still clusterProfiler
When this issue is addressed, note that the intro paragraph from #349 will need to be added here, and the table will need to be made to reflect the RNA-seq versions of the analyses.
A rough outline of the plan to tackle this issue is as follows (largely mimicking what we have in the microarray GSEA example):
- Load in the needed packages (
clusterProfiler
to run GSEA,msigdbr
and species annotation package -- Homo sapiens in this case) - Import the results from this repo's
differential-expression_rnaseq_01.Rmd
using a URL (noting that this is tentative to change based on how the visualization of the GSEA results for this dataset looks later on in the example) - For consistency with the microarray analyses using
clusterProfiler
include the “getting familiar withclusterProfiler
's options" section - Isolate the hallmark gene sets using
msigdbr()
- Perform gene identifier conversion (using the
mapIds()
function) - Join the expression data and filter out any duplicate gene identifiers based on highest absolute log2FC value (as opposed to the t-statistic which was used in the microarray example as it does not appear that t-statistic values are available in the results dataset from
differential-expression_rnaseq_01.Rmd
- Determine the pre-ranked gene list based on the gene level statistic from the previous step
- Perform gsea using
GSEA()
function fromclusterProfiler
providing the pre-ranked gene list to thegeneList
argument and largely keeping the parameters the same as in02-microarray/pathway-analysis_microarray_02_gsea.Rmd.Rmd
example - Preview the mose negative and the most positive enrichment scores and visualize them using
enrichplot::gseaplot()
- Save plots using
ggsave()
- Write GSEA results to file
Note that the pathway analysis introductory paragraphs from #349 will be added here.
Are there any suggestions for other alternate strategies we may want to show in this RNA-seq example (compared to the microarray example), besides the use of a different species dataset?
I am about to file a draft PR for this issue, however before doing so, I wanted to note here that this module's differential expression results file (produced using the SRP078441
RNA-seq dataset) generated no GSEA results. As there is only one RNA-seq differential expression example analysis, my thoughts are to resort to a similar strategy as we did in merged PR #362 and perform differential expression (perhaps using limma
) on a different dataset.
Are there any thoughts or alternative solutions? cc: @cansavvy @jaclyn-taroni
As there is only one RNA-seq differential expression example analysis, my thoughts are to resort to a similar strategy as we did in merged PR #362 and perform differential expression (perhaps using
limma
) on a different dataset.
I think this may be a case of me not fully understanding your meaning, but I wouldn't consider performing differential gene expression analysis and using the results to perform GSEA to be a similar strategy to performing GSVA and testing the scores for differential expression. I'd also say that I don't think we want to use limma
for RNA-seq differential gene expression analysis when we've used DESeq2
heavily elsewhere unless we had a very good reason we were ready to get into. Information question - what gene sets did you use?
As there is only one RNA-seq differential expression example analysis, my thoughts are to resort to a similar strategy as we did in merged PR #362 and perform differential expression (perhaps using
limma
) on a different dataset.I think this may be a case of me not fully understanding your meaning, but I wouldn't consider performing differential gene expression analysis and using the results to perform GSEA to be a similar strategy to performing GSVA and testing the scores for differential expression. I'd also say that I don't think we want to use
limma
for RNA-seq differential gene expression analysis when we've usedDESeq2
heavily elsewhere unless we had a very good reason we were ready to get into. Information question - what gene sets did you use?
Gotcha, I guess by similar strategy I meant include the differential expression steps in the notebook although it is not necessarily a differential expression analysis example. However, your question was very helpful! I have been still using the hallmark gene sets. I just tried running the analysis on the default (all) gene sets and that produced results!
I have been still using the hallmark gene sets. I just tried running the analysis on the default (all) gene sets and that produced results!
I am not sure what this means, but I will find out when you file the draft!
Seems like this can be closed?