/setfisher

An R package performing hypergeometric enrichment analysis while managing identifier translation and multiple voting

Primary LanguageRGNU Lesser General Public License v2.1LGPL-2.1

SetFisher is an R package that manages the application of phyper() for hypergeometric distribution (Fisher's exact test) analyses supporting gene set enrichment analysis. Novel components of the package are:

  • Namespace mapping: An optional "Mapping Matrix" can be provided, which maps input IDs to different identifiers used in the ontologies (eg from Affymetrix probesets to Entrez GeneIDs)
  • Multiple voting accomodation: Fisher's Exact Test is very sensitive to dependencies amongst the input (it's why it is used, after all). If there are 1:many or many:many relationships between the experimental assays and the target genes, this technical dependency can obscure biological relationships. SetFisher attempts to manage these issues by tracking "fractional counts" through the mapping matrix.
  • Automated filtering: Ontologies can be optionally "trimmed" to remove terms with few assigned genes, or those with a large number. Similarly, genes can be optionally filtered to remove those with insufficient ontological support. This process helps remove "speculative" genes that are either fictional (do not actually exist) or extremely esoteric; Such genes tend to inflate significance scores.

More details can be found in How it Works, which lays out the philosophy and basic operation of the package.

Sample analyses and ontology matrices will be published at a latter date.