/proActiv

Estimation of Promoter Activity from RNA-Seq data

Primary LanguageRMIT LicenseMIT

Stay proActiv!

Stay proActiv!

proActiv: Estimation of Promoter Activity from RNA-Seq data

proActiv is an R package that estimates promoter activity from RNA-Seq data. proActiv uses aligned reads and genome annotations as input, and provides absolute and relative promoter activity as output. The package can be used to identify active promoters and alternative promoters, the details of the method are described in Demircioglu et al.

Additional data on differential promoters in tissues and cancers from TCGA, ICGC, GTEx, and PCAWG can be downloaded here: https://jglab.org/data-and-software/

Content

Installation

proActiv can be installed from GitHub with:

library("devtools")
devtools::install_github("GoekeLab/proActiv")

Estimate Promoter Activity

This is a basic example to estimate promoter activity from a set of RNA-Seq data which was aligned with TopHat2 (or STAR). proActiv will use the junction file from the TopHat2 (STAR) alignment, and a set of annotation objects that describe the associations of promoters, transcripts, and genes, to calculate promoter activity.

library(proActiv)

# Preprocessed annotations are available as part of the R package for the human genome (hg19):
# proActiv::promoterAnnotationData.gencode.v19

# The paths and labels for samples
junctionFiles <- list.files(system.file('extdata/tophat2', package = 'proActiv'), full.names = TRUE)

# for STAR alignment
# junctionFiles <- list.files(system.file('extdata/star', package = 'proActiv'), full.names = TRUE)

junctionFileLabels <- paste0('s', 1:length(junctionFiles))

# Count the total number of junction reads for each promoter
promoterCounts <- calculatePromoterReadCounts(proActiv::promoterAnnotationData.gencode.v19,
                                                      junctionFilePaths = junctionFiles,
                                                      junctionFileLabels =  junctionFileLabels,
                                                      junctionType = 'tophat')  # use junctionType = 'star' for STAR aligned reads

# Normalize promoter read counts by DESeq2 (optional)
normalizedPromoterCounts <- normalizePromoterReadCounts(promoterCounts)

# Calculate absolute promoter activity
absolutePromoterActivity <- getAbsolutePromoterActivity(normalizedPromoterCounts,
                                                               proActiv::promoterAnnotationData.gencode.v19)
# Calculate gene expression
geneExpression <- getGeneExpression(absolutePromoterActivity)

# Calculate relative promoter activity
relativePromoterActivity <- getRelativePromoterActivity(absolutePromoterActivity,
                                                               geneExpression)

Annotation and Example Data

Pre-calculated promoter annotation data for Gencode v19 (GRCh37) is available as part of the proActiv package. The PromoterAnnotation object has 4 slots:

  • reducedExonRanges : The reduced first exon ranges for each promoter with promoter metadata for Gencode v19
  • promoterIdMapping : The id mapping between transcript ids, names, TSS ids, promoter ids and gene ids for Gencode v19
  • annotatedIntronRanges : The intron ranges annotated with the promoter information for Gencode v19
  • promoterCoordinates : Promoter coordinates (TSS) with gene id and internal promoter state for Gencode v19

Example junction files as produced by TopHat2 and STAR are available as external data. The reference genome used for alignment is Gencode v19 (GRCh37). The TopHat2 and STAR example files (5 files each) can be found at ‘extdata/tophat2’ and ‘extdata/star’ folders respectively.

Example TopHat2 files:

  • extdata/tophat2/sample1.bed
  • extdata/tophat2/sample2.bed
  • extdata/tophat2/sample3.bed
  • extdata/tophat2/sample4.bed
  • extdata/tophat2/sample5.bed

Example STAR files:

  • extdata/tophat2/sample1.junctions
  • extdata/tophat2/sample2.junctions
  • extdata/tophat2/sample3.junctions
  • extdata/tophat2/sample4.junctions
  • extdata/tophat2/sample5.junctions

Creating your own promoter annotations

proActiv provides functions to create promoter annotation objects for any genome. Here we describe how the annotation can be created using a TxDb object (please see the TxDb documentation for how to create annotations from a GTF file).

A TxDb object for the human genome version hg19 (Grch37) can be downloaded here: inputFiles

library(GenomicRanges)
library(GenomicFeatures)
library(GenomicAlignments)
library(dplyr)
library(proActiv)

# Load the txdb object for your annotation of choice (Gencode v19 used here)
txdb <- loadDb('./inputFiles/annotation/gencode.v19.annotation.sqlite')

# The species argument to be used for GenomeInfoDb::keepStandardChromosomes
species <- 'Homo_sapiens'
# The number of cores to be used for parallel execution (mc.cores argument for parallel::mclappy), optional
numberOfCores <- 1

### Annotation data preparation
promoterAnnotationData <- preparePromoterAnnotationData(txdb, species = species, numberOfCores = numberOfCores)

# Retrieve the id mapping between transcripts, TSSs, promoters and genes
head(promoterIdMapping(promoterAnnotationData))

# Retrieve promoter coordinates
head(promoterCoordinates(promoterAnnotationData))

Release History

Initial Release 0.1.0

Release date: 19th May 2020

This release corresponds to the proActiv version used by Demircioglu et al.

Limitations

proActiv will not provide promoter activity estimates for promoters which are not uniquely identifiable from splice junctions (single exon transcripts, promoters which overlap with internal exons).

Reference

If you use proActiv, please cite:

Demircioğlu, Deniz, et al. “A Pan-cancer Transcriptome Analysis Reveals Pervasive Regulation through Alternative Promoters.” Cell 178.6 (2019): 1465-1477.

Contributors

proActiv is developed and maintained by Deniz Demircioglu, Joseph Lee, and Jonathan Göke.

Stay proActiv!