AlexsLemonade/refinebio-examples

RNA-seq: Pathway Analysis - GSEA

Closed this issue · 7 comments

What are the goals of this new example analysis?

We have a GSEA example for microarray, but should create one for RNA-seq.

What kind of dataset will this need?

We just need ranked gene results from RNA-seq.

What steps should be included in this analysis?

I think most of the GSEA steps from microarray can stay the same. We can import the results fromdifferential-expression_rnaseq_01.Rmd's results from a URL. But, this won't be much different from differential-expression_rnaseq_01.Rmd so we should discuss alternative strategies we might want to show here that we didn't show there.

What packages/methods do you recommend using or looking into for this analysis?

Still clusterProfiler

When this issue is addressed, note that the intro paragraph from #349 will need to be added here, and the table will need to be made to reflect the RNA-seq versions of the analyses.

A rough outline of the plan to tackle this issue is as follows (largely mimicking what we have in the microarray GSEA example):

  1. Load in the needed packages (clusterProfiler to run GSEA, msigdbr and species annotation package -- Homo sapiens in this case)
  2. Import the results from this repo's differential-expression_rnaseq_01.Rmd using a URL (noting that this is tentative to change based on how the visualization of the GSEA results for this dataset looks later on in the example)
  3. For consistency with the microarray analyses using clusterProfiler include the “getting familiar with clusterProfiler's options" section
  4. Isolate the hallmark gene sets using msigdbr()
  5. Perform gene identifier conversion (using the mapIds() function)
  6. Join the expression data and filter out any duplicate gene identifiers based on highest absolute log2FC value (as opposed to the t-statistic which was used in the microarray example as it does not appear that t-statistic values are available in the results dataset from differential-expression_rnaseq_01.Rmd
  7. Determine the pre-ranked gene list based on the gene level statistic from the previous step
  8. Perform gsea using GSEA() function from clusterProfiler providing the pre-ranked gene list to the geneList argument and largely keeping the parameters the same as in 02-microarray/pathway-analysis_microarray_02_gsea.Rmd.Rmd example
  9. Preview the mose negative and the most positive enrichment scores and visualize them using enrichplot::gseaplot()
  10. Save plots using ggsave()
  11. Write GSEA results to file

Note that the pathway analysis introductory paragraphs from #349 will be added here.
Are there any suggestions for other alternate strategies we may want to show in this RNA-seq example (compared to the microarray example), besides the use of a different species dataset?

I am about to file a draft PR for this issue, however before doing so, I wanted to note here that this module's differential expression results file (produced using the SRP078441 RNA-seq dataset) generated no GSEA results. As there is only one RNA-seq differential expression example analysis, my thoughts are to resort to a similar strategy as we did in merged PR #362 and perform differential expression (perhaps using limma) on a different dataset.

Are there any thoughts or alternative solutions? cc: @cansavvy @jaclyn-taroni

As there is only one RNA-seq differential expression example analysis, my thoughts are to resort to a similar strategy as we did in merged PR #362 and perform differential expression (perhaps using limma) on a different dataset.

I think this may be a case of me not fully understanding your meaning, but I wouldn't consider performing differential gene expression analysis and using the results to perform GSEA to be a similar strategy to performing GSVA and testing the scores for differential expression. I'd also say that I don't think we want to use limma for RNA-seq differential gene expression analysis when we've used DESeq2 heavily elsewhere unless we had a very good reason we were ready to get into. Information question - what gene sets did you use?

As there is only one RNA-seq differential expression example analysis, my thoughts are to resort to a similar strategy as we did in merged PR #362 and perform differential expression (perhaps using limma) on a different dataset.

I think this may be a case of me not fully understanding your meaning, but I wouldn't consider performing differential gene expression analysis and using the results to perform GSEA to be a similar strategy to performing GSVA and testing the scores for differential expression. I'd also say that I don't think we want to use limma for RNA-seq differential gene expression analysis when we've used DESeq2 heavily elsewhere unless we had a very good reason we were ready to get into. Information question - what gene sets did you use?

Gotcha, I guess by similar strategy I meant include the differential expression steps in the notebook although it is not necessarily a differential expression analysis example. However, your question was very helpful! I have been still using the hallmark gene sets. I just tried running the analysis on the default (all) gene sets and that produced results!

I have been still using the hallmark gene sets. I just tried running the analysis on the default (all) gene sets and that produced results!

I am not sure what this means, but I will find out when you file the draft!

Seems like this can be closed?