/rnaseq-graft

RNA-seq quantification pipeline used by the eQTL Catalogue

Primary LanguageNextflowMIT LicenseMIT

Introduction

eQTL-Catalogue/rnaseq is a bioinformatics analysis pipeline used for processing RNA-sequencing data for the eQTL Catalogue.

The workflow processes raw data from fastq inputs (Trim Galore!); aligns the reads (HiSAT2); generates gene and exon counts (featureCounts, DEXSeq); quantifes transcript usage (Salmon), transcriptional event usage (txrevise) and splice junction usage (leafcutter); and check concordance between genotypes in BAM and VCF files (qtltools mbv).

The pipeline is built using Nextflow, a bioinformatics workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker / singularity containers making installation trivial and results highly reproducible.

Documentation

The eQTL-Catalogue/rnaseq pipeline comes with documentation about the pipeline, found in the docs/ directory:

  1. Installation
  2. Running the pipeline
  3. Running the pipeline with test data

General overview

The schema shown below represents the high level structure of the pipeline.

eQTL-Catalogue/rnaseq

Credits

This pipeline is highly influenced by much earlier version of the nf-core/rnaseq pipeline which was originally written for use at the National Genomics Infrastructure, part of SciLifeLab in Stockholm, Sweden, by Phil Ewels (@ewels) and Rickard Hammarén (@Hammarn).

New quantification methods (exon expression, transcript usage, transcriptional event usage and intron-splicing usage) are added by Alasoo Lab within the OpenTargets eQTL Catalogue project. Please cite eQTL Catalogue paper if this resource have been used for your research. https://doi.org/10.1038/s41588-021-00924-w

Many thanks to other who have helped out along the way too, including (but not limited to): @Galithil, @pditommaso, @orzechoj, @apeltzer, @colindaven.