Grape

Grape provides an extensive pipeline for RNA-Seq analyses. It allows the creation of an automated and integrated workflow to manage and analyse RNA-Seq data.

It uses Nextflow as the execution backend. Please check Nextflow documentation for more information.

Quickstart

Installing Nextflow

Nextflow is distributed as a self-contained executable package, this means that it does not require any special installation procedure.

It only needs two easy steps:

Download the executable package by copying and pasting the following command in your terminal window: wget -qO- get.nextflow.io | bash. It will create the nextflow main executable file in the current directory.
Optionally, move the nextflow file in a directory accessible by your $PATH variable (this is only required to avoid to remember and type the Nextflow full path each time you need to run it).

Tip	In the case you don’t have `wget` installed you can use the `curl` utility instead by entering the following command: `curl -fsSL get.nextflow.io \| bash`

Note	The pipeline requires Nextflow version 0.23.1 or higher. Make sure you have the right version by running `nextflow info`. In case you have an older version you can upgrade as follows:

$ nextflow -self-update

Setting up the pipeline

Install the pipeline

Using Nextflow github sharing feature the pipeline can be easily installed:

$ nextflow pull guigolab/grape-nf
Checking guigolab/grape-nf ...
 downloaded from https://github.com/guigolab/grape-nf.git

Run a test

The pipeline has a quick test profile already configured.

Warning

Docker is required in order to run the test

First create a working directory:

$ mkdir test && cd test

Then run the test:

$ nextflow run grape-nf -profile test
N E X T F L O W  ~  version 0.17.3
Launching 'guigolab/grape-nf' - revision: a6147f7add [master]

G R A P E ~ RNA-seq Pipeline

General parameters
------------------
Index file                      : /Users/emilio/.nextflow/assets/guigolab/grape-nf/test-index.txt
Genome                          : /Users/emilio/.nextflow/assets/guigolab/grape-nf/data/genome.fa
Annotation                      : /Users/emilio/.nextflow/assets/guigolab/grape-nf/data/annotation.gtf
Pipeline steps                  : mapping bigwig contig quantification

Mapping parameters
------------------
Tool                            : GEM 1.7.1
Max mismatches                  : 4
Max multimaps                   : 10

Bigwig parameters
-----------------
Tool                            : RGCRG 0.1
References prefix               : chr

Quantification parameters
-------------------------
Tool                            : FLUX 1.6.1
Mode                            : Genome

Execution information
---------------------
Engine                          : local
Use Docker                      : true
Error strategy                  : ignore

Dataset information
-------------------
Number of sequenced samples     : 1
Number of sequencing runs       : 1
Merging                         : none

===============
Output files db -> /Users/emilio/workspace/grape-nf/pipeline.db
===============

[warm up] executor > local
[09/a54cf9] Submitted process > fastaIndex (genome-SAMtools-0.1.19)
[65/89ba24] Submitted process > index (genome-GEM-1.7.1)
[da/39803f] Submitted process > mapping (test1-GEM-1.7.1)
[81/bcd1cf] Submitted process > inferExp (test1-RSeQC-2.3.9)
[a9/686572] Submitted process > quantification (test1-FLUX-1.6.1)
[67/2783a7] Submitted process > contig (test1-RGCRG-0.1)
[87/c103c9] Submitted process > bigwig (test1-RGCRG-0.1)

-----------------------
Pipeline run completed.
-----------------------

Get the pipeline help

The usage message can be seen with the following command:

$ nextflow run grape-nf --help
N E X T F L O W  ~  version 0.17.3
Launching 'guigolab/grape-nf' - revision: a6147f7add [master]

G R A P E ~ RNA-seq Pipeline
----------------------------
Run the GRAPE RNA-seq pipeline on a set of data.

Usage:
    grape-pipeline.nf --index INDEX_FILE --genome GENOME_FILE --annotation ANNOTATION_FILE [OPTION]...

Options:
    --help                              Show this message and exit.
    --index INDEX_FILE                  Index file.
    --genome GENOME_FILE                Reference genome file(s).
    --annotation ANNOTAION_FILE         Reference gene annotation file(s).
    --steps STEP[,STEP]...              The steps to be executed within the pipeline run. Possible values: "mapping", "bigwig", "contig", "quantification". Default: all
    --max-mismatches THRESHOLD          Set maps with more than THRESHOLD error events to unmapped. Default "4".
    --max-multimaps THRESHOLD           Set multi-maps with more than THRESHOLD mappings to unmapped. Default "10".
    --bam-sort METHOD                   Specify the method used for sorting the genome BAM file.
    --add-xs                            Add the XS field required by Cufflinks/Stringtie to the genome BAM file.

SAM read group options:
    --rg-platform PLATFORM              Platform/technology used to produce the reads for the BAM @RG tag.
    --rg-library LIBRARY                Sequencing library name for the BAM @RG tag.
    --rg-center-name CENTER_NAME        Name of sequencing center that produced the reads for the BAM @RG tag.
    --rg-desc DESCRIPTION               Description for the BAM @RG tag.

Configuring the pipeline

Executors

Nextflow provides different Executors to run processes on the local machine, on a computational grid or the cloud without any change to the actual code. By default a local executor is used, but it can be changed by using Nextflow executors.

For example, to run the pipeline in a computational cluster using Sun Grid Engine you can set up a nextflow.config file in your current working directory with something like:

process {
    executor = 'sge'
    queue    = 'my-queue'
    penv     = 'smp'
}

Input data

The pipeline needs as an input a tab separated file containing containing information about the FASTQ files to be processed. The needed columns in order are:

 sample
 id
 path
 type
 view

Note	Fastq files from paired-end data will be grouped together by `id`.

Here is an example from the test run:

sample1  test1   data/test1_1.fastq.gz   fastq   FqRd1
sample1  test1   data/test1_2.fastq.gz   fastq   FqRd2

Sample and id can be the same in case you don’t have/know sample identifiers:

sample1  test1   data/test1_1.fastq.gz   fastq   FqRd1
sample1  test1   data/test1_2.fastq.gz   fastq   FqRd2

Software

The default Grape configuration uses Docker to provision the programs needed for the execution. Pre-built Grape containers are publicly available at the Grape page in Docker Hub.

Nextflow also supports Environment Modules. Creating a working configuration for Grape using modules is not yet straightforward. If you need to use modules please contact us directly.

Pipeline profiles

The Grape pipeline can be run using different configuration profiles. The profiles essentially allow the user to run the analyses using different tools and configurations. To specify a profile you can use the -profiles Nextflow option.

The following profiles are available at present:

profile	description
gemflux	uses `GEMtools` for mapping pipeline and `Flux Capacitor` for isoform expression quantification
starrsem	uses `STAR` for mapping and bigwig and `RSEM` for isoform expression quantification
starflux	uses `STAR` for mapping and `Flux Capacitor` for isoform expression quantification

The default profile uses STAR and RSEM and set the --bam-sort option to samtools.

Run the pipeline

To run the pipeline first create a working directory and move there:

$ mkdir grape-pipeline && cd grape-pipeline

Here is a simple example of how you can run the pipeline once you set up it properly:

nextflow -bg run grape-nf --index input-files.tsv --genome refs/hg38.AXYM.fa --annotation refs/gencode.v21.annotation.AXYM.gtf --rg-platform ILLUMINA --rg-center-name CRG -resume > pipeline.log

By default the pipeline execution will stop as far as one of the processes fails. To change this behaviour you can use the [errorStrategy directive](http://www.nextflow.io/docs/latest/process.html#errorstrategy) of Nextflow processes. You can also specify it on command line. For example, to ignore errors and keep processing you can use -process.errorStrategy=ignore.

It is also possible to run a subset of pipeline steps using the option --steps. For example, the following command will only run the mapping and quantification steps:

nextflow -bg run grape-nf --steps mapping,quantification --index input-files.tsv --genome refs/hg38.AXYM.fa --annotation refs/gencode.v21.annotation.AXYM.gtf --rg-platform ILLUMINA --rg-center-name CRG > pipeline.log

Pipeline results

The pipeline compiles a list of output files into the file pipeline.db inside the current working folder. This format of this file is similar to the input with a few more columns:

 sample
 id
 path
 type
 view
 readType
 readStrand