/TEquant

Tools for manipulating quantification of transposable element reads from RNA-seq data.

Primary LanguagePython

TEquant

Tools for manipulating quantification of transposable element reads from RNA-seq data.

Workflow

  • Map reads to reference

  • Create a GTF

  • Create a table relating different levels of the TE heirarchy

  • Run featurecounts on each sample

    • Settings depend on library type (e.g. stranded, unstranded)
    • Can ignore or keep multimapping reads
    • Note that featurecounts will by default throw away reads that overlap multiple features by default
  • Merge featurecounts outputs across samples

    • Using merge_featurecounts_lowmem.py
    • This generates a file with counts across all samples, and some metadata (e.g. total number of mapped reads in each sample), indicated by a '_' prefix (e.g. '_assigned' for reads assigned to a feature - see featurecounts documentation for details).
  • Subselect and sum data across transcript types:

    • Using split_and_sum_TE_counts
    • Creates a count matrix for each type of transcript:
      • gene_id
      • TEclass_id (most general)
      • TEfamily_id
      • TEgene_id
      • TEtranscript_id (most specific - individual loci)