Intergenic Transcription

Repository of the code to reproduce analysis and figures for "Intergenic RNA mainly derives from nascent transcripts of known genes". bioRxiv, 2020.

Requirements

stringtie
igvtools
gffcompare
R

R packages requirements:

GenomicFeatures
rtracklayer
data.table
ggplot2
ggpubr
cowplot
ggthemes
ggsci
ggforce
ggExtra
ggrepel
scales
DT
circlize
BSgenome.Hsapiens.UCSC.hg38
ggbio
phastCons100way.UCSC.hg38
GenomicAlignments
genomation
VennDiagram
viridis

Preliminary steps

RNA-seq (and NET-seq) datasets

Pre-processing, alignment to the human reference genome and generation of the individual transcriptome assemblies for each dataset have been performed with the RNA-seq-pipeline; the Supplementary Table 1 contains all the accession codes of the datasets used for annotation and validation.

The output files obtained with this procedure should be placed in the following folders:

stringtie: GTF files produced by stringtie;
counts: QoRTs folders (containing the QC.geneCounts.detailed.txt.gz and QC.summary.txt files) and StrandCheck.out.tab files;
RNAseq_bw: stranded (plus and minus) CPM normalised bigWig files;
RNAseq_bam: BAM files (post-deduplication).

Reference annotation

Parsing of the Gencode v27 reference annotaion and generation of the R objects used in this analysis have been performed with the R_Gencode_Reference processing scripts.

The R objects obtained with this procedure should be placed in the GencodeReference folder (default); otherwise, change the annotationFolder path using their location on the current machine.

Running the analysis

The main code is in R markdown format (Rmd), which can be opened and executed via R studio or other compatible editors, and it is subdivided into multiple 'chunks', thus providing the ability to execute the different tasks step-by-step.

Final annotation files (BED format)

luslab/IntergenicTranscription