/rnaseq

RNA-seq quantification pipeline used by the eQTL Catalogue

Primary LanguageNextflowMIT LicenseMIT

Introduction

eQTL-Catalogue/rnaseq is a bioinformatics analysis pipeline used for processing RNA-sequencing data for the eQTL Catalogue.

The workflow processes raw data from fastq inputs (Trim Galore!); aligns the reads (HiSAT2); generates gene and exon counts (featureCounts, DEXSeq); quantifes transcript usage (Salmon), transcriptional event usage (txrevise) and splice junction usage (leafcutter); and check concordance between genotypes in BAM and VCF files (qtltools mbv). See the output documentation for more details of the results.

See optional quantification methods for details.

The pipeline is built using Nextflow, a bioinformatics workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker / singularity containers making installation trivial and results highly reproducible.

Documentation

The eQTL-Catalogue/rnaseq pipeline comes with documentation about the pipeline, found in the docs/ directory:

  1. Installation
  2. Running the pipeline
  3. Running the pipeline with test data

General overview

The schema shown below represents the high level structure of the pipeline.

nfcore/rnaseq

Credits

This pipeline is highly influenced by much earlier version of the nf-core/rnaseq pipeline which was originally written for use at the National Genomics Infrastructure, part of SciLifeLab in Stockholm, Sweden, by Phil Ewels (@ewels) and Rickard Hammarén (@Hammarn).

New quantification methods (exon expression, transcript usage, transcriptional event usage and intron-splicing usage) are added by Alasoo Lab within the OpenTargets eQTL Catalogue project. Please cite eQTL Catalogue paper if this resource have been used for your research. https://doi.org/10.1038/s41588-021-00924-w

Many thanks to other who have helped out along the way too, including (but not limited to): @Galithil, @pditommaso, @orzechoj, @apeltzer, @colindaven.