/cerberus

Primary LanguageJupyter Notebook

DOI

Cerberus logo Cerberus

Cerberus is a set of tools designed to characterize and enhance transcriptome annotations. Currently Cerberus can do the following:

  • Represent transcript start sites (TSSs) and transcript end sites (TESs) as bed regions rather than single base pair ends
  • Integrate intron chains from multiple transcriptome annotations (GTFs) to create a transcriptome of the union of them all
  • Integrate TSSs and TESs from multiple GTFs as well as from outside BED sources to create end annotations from the union of them all
  • Number intron chains, TSSs, and TESs found by their priority in a reference GTF
  • Use the enhanced intron chain and 5'/3' end sets to annotate an existing GTF transcriptome with transcript triplets and to modify the GTF and corresponding abundance matrices to reflect the new naming scheme / identities of the transcripts
  • Compute gene triplets for different sets of isoforms for each gene based on the TSSs, ICs, and TESs used among the isoforms of the gene
  • Generate plots (see examples below) to visualize gene triplets on the gene structure simplex
  • Compute centroids of gene triplet coordinate distributions
  • Compute pairwise gene structure simplex distances between pairs of gene triplets

Density gene structure simplex

Scatter gene structure simplex

Please visit the Cerberus website for documentation.

Note: Cerberus is under active development. Please feel free to open an issue or email me ( freese {at} uci.edu ) if you're interested in using it!