/IntergenicTranscription

Repository of codes used in the Intergenic transcription manuscript

GNU General Public License v3.0GPL-3.0

Intergenic Transcription

Repository of the code to reproduce analysis and figures for "Intergenic RNA mainly derives from nascent transcripts of known genes". bioRxiv, 2020.

Requirements

  • stringtie
  • igvtools
  • gffcompare
  • R

R packages requirements:

  • GenomicFeatures
  • rtracklayer
  • data.table
  • ggplot2
  • ggpubr
  • cowplot
  • ggthemes
  • ggsci
  • ggforce
  • ggExtra
  • ggrepel
  • scales
  • DT
  • circlize
  • BSgenome.Hsapiens.UCSC.hg38
  • ggbio
  • phastCons100way.UCSC.hg38
  • GenomicAlignments
  • genomation
  • VennDiagram
  • viridis

Preliminary steps

RNA-seq (and NET-seq) datasets

Pre-processing, alignment to the human reference genome and generation of the individual transcriptome assemblies for each dataset have been performed with the RNA-seq-pipeline; the Supplementary Table 1 contains all the accession codes of the datasets used for annotation and validation.

The output files obtained with this procedure should be placed in the following folders:

  • stringtie: GTF files produced by stringtie;
  • counts: QoRTs folders (containing the QC.geneCounts.detailed.txt.gz and QC.summary.txt files) and StrandCheck.out.tab files;
  • RNAseq_bw: stranded (plus and minus) CPM normalised bigWig files;
  • RNAseq_bam: BAM files (post-deduplication).

Reference annotation

Parsing of the Gencode v27 reference annotaion and generation of the R objects used in this analysis have been performed with the R_Gencode_Reference processing scripts.

The R objects obtained with this procedure should be placed in the GencodeReference folder (default); otherwise, change the annotationFolder path using their location on the current machine.

Running the analysis

The main code is in R markdown format (Rmd), which can be opened and executed via R studio or other compatible editors, and it is subdivided into multiple 'chunks', thus providing the ability to execute the different tasks step-by-step.

Final annotation files (BED format)

All identified TUs

  1. Gencode v27 + Intergenic TUs (all)

TUs expressed in HeLa cells

  1. Gencode v27 + Intergenic TUs (HeLa)

TUs selected for metaprofile analysis

  1. Metaprofiles Proximal TUs

  2. Metaprofiles Linker TUs

  3. Metaprofiles Independent TUs