/nmf_gsea

Thesis: Identifying significantly enriched gene sets with NMF-derived metagenes and their difference vectors

nmf_gsea

Identifying significantly enriched gene sets with NMF-derived metagenes and their difference vectors.

R markdown demonstrates improved identification of key pathways involved in 3 CNS tumour subtypes:

  • RNA-seq data is read, annotated, and cleaned.
  • nsNMF machine learning algorithm reduces dimensionality into 4 metagenes (incl. normal).
  • Pair-wise difference vectors are calculated, and GSEA results compared with those from each metagene.
  • Difference vectors identify several key pathways that are overlooked by use of metagenes alone.

To view results, pull repository and extract ...Results.tar.gz files to load .RData files containing the results of nsNMF decompositions, GSEA results for the decomposed RNA-seq data (Dataset 1), and an additional dataset that has undergone tumour purification (Dataset 2). A docker image containing the R packages used can be downloaded here.

Pathway enrichment analysis is summarized in out/tables folder, and results are visualised in out/figures folder. Leading edge genes can also be exported from listed DOSE::gseaResult objects.

The plots below demonstrate that subtracting the normal metagene from the medulloblastoma metagene exposes downregulation of the synaptic vesicle cycle pathway "hsa04721" in the medulloblastoma samples, where enrichment analysis of the medulloblastoma metagene did not.

emapplots: MN vs M

dotplots: MN vs M

cnetplots: MN vs M