RNA-seq analysis pipeline

This is a Snakemake based pipeline for RNA-seq used in the Tumor Genome Core Analysis housed in the Cancer Center Amsterdam, at Amsterdam UMC location VUmc and part of the Department of Pathology.

The pipeline processes raw data from FastQ inputs (FastQC, Trimmomatic), aligns the reads (STAR), generates gene counts (featureCounts) and performs quality-control on the results (MultiQC). Paired-end (PE) and single read (SR) are supported.

Installation

The pipeline is preliminary used in linux environment with conda/singularity available.

Using Conda

Step 1: Installing Miniconda 3

First, please open a terminal or make sure you are logged into your Linux VM. Assuming that you have a 64-bit system, on Linux, download and install Miniconda 3 with:

wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

On MacOS X, download and install with:

curl https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -o Miniconda3-latest-MacOSX-x86_64.sh
bash Miniconda3-latest-MacOSX-x86_64.sh

Step 2: Downloading repository & creating environment

mkdir snakemake_RNAseq
cd snakemake_RNAseq
git clone https://github.com/tgac-vumc/RNA-seq
conda env create --name RNAseq --file env.yaml

Using Singularity

The singularity container holds a virtual environment of CentOS 7 and it's available with:

singularity pull shub://tgac-vumc/RNA-seq

Path Configuration & Running the pipeline

Before attempting to run the pipeline, please open config.yaml. Inside, you will encounter Path Configuration and Software Options.

On Path configuration, first, you have to choose whether your data is PE or SR and after change the fastq path to the path where your fastq files are actually stored.
On Software Options, you will find several options that can be modified by the user. Please, have a look at it before running the pipeline.

All the software used in the pipeline is installed by conda or executed in a wrapper. We recommend to run the pipeline from a different location than the pipeline path, like the example below:

snakemake -s PATH_TO_PIPELINE/Snakefile --use-conda --cores=24

With --use-conda option, the pipeline will create environments to run rules based on env.yaml. Note the pipeline assumes that config.yaml is available at the location where the pipeline is executed.