Performs GeneSet Enrichment Analysis (GSEA) based on one-tail Fisher's Exact Test (Hypergeometric test). Implemented in R.
Performs GeneSet Enrichment Analysis (GSEA) based on one-tail Fisher's Exact Test
get.enrichment.test(file.query, batch, file.background, dir.gmt, threshold, dir.out)
file.query
: Path of the file containing the list of genes for which enrichment test is to be performed. One Gene per linebatch
: Name of the output folder. This folder will be under the folderdir.out
file.background
: Path of the file containing the list of background genes. One Gene per linedir.gmt
: Path of the folder containing the genesets (.gmt) filesthreshold
: Threshold of FDR p-value cutoff. The genesets with FDR less that threshold will be selecteddir.out
: Path of the output folder. Defaults to ''enrichment'' folder
The output file is a tab separated table containing following columns:
Category
: Name of the Genesetpvalue
: P-value of fisher's exact testfdr
: False Discovery Rate (FDR) adjusted p-value, uses ''Benjamini Hochberg'' method for p-value correctionoverlap.percent
: Percentage of query genes that are also found in the enriched genesetoverlap.genes
: List of query genes that are also found in the enriched genesetDescription
: Description of the geneset
Performs one-tail Fisher's Exact Test (Hypergeometric test)
get.fisher.exact.test(dat.genesets, genes.queryset, genes.refset, ct)
dat.genesets
: Data frame containing genesets (output file generated using parseGMT() function)genes.queryset
: List of genes for which enrichment test is to be performedgenes.refset
: List of background set of genesct
: Threshold of FDR p-value cutoff. The genesets with FDR less that ct will be selected
Parse GMT file in appropriate format for enrichment test
parseGMT(dir.gmt)
dir.gmt
: path of folder under which genesets (.gmt) files are stored
The output file is a tab separated table containing following columns:
Category
: Name of the GenesetGenesets
: List of genes in the geneset each seperated by a column (':')Description
: Description of the geneset
Load the R script run_GSEAfisher.R
source("run_GSEAfisher.R")
Parse GMT files. To be used only once for a file. If the GMT files are already parsed, skip this step.
parseGMT(gmt.name="genesets/Msigdb")
Now run the enrichment test
# Defile Paths
dir.db <- file.path("genesets/Msigdb")
file.bg <- file.path(dir.wrk, "data/background_genelist_test.txt")
file.genelist <- file.path(dir.wrk, "data/genelist_test.txt")
# Perform Enrichment Test
get.enrichment.test(file.query=file.genelist, batch="test", file.background=file.bg, dir.gmt=dir.db, threshold=0.01)
The output files can be found under enrichment/TEST
folder.