Grape provides an extensive pipeline for RNA-Seq analyses. It allows the creation of an automated and integrated workflow to manage and analyse RNA-Seq data.
It uses Nextflow as the execution backend. Please check Nextflow documentation for more information.
Nextflow is distributed as a self-contained executable package, this means that it does not require any special installation procedure.
It only needs two easy steps:
-
Download the executable package by copying and pasting the following command in your terminal window:
wget -qO- get.nextflow.io | bash
. It will create thenextflow
main executable file in the current directory. -
Optionally, move the
nextflow
file in a directory accessible by your$PATH
variable (this is only required to avoid to remember and type the Nextflow full path each time you need to run it).
Tip
|
In the case you don’t have wget installed you can use the curl utility instead by entering the following command: curl -fsSL get.nextflow.io | bash
|
Note
|
The pipeline requires Nextflow version 0.23.1 or higher. Make sure you have the right version by running nextflow info . In case you have an older version you can upgrade as follows:
|
$ nextflow -self-update
Using Nextflow github sharing feature the pipeline can be easily installed:
$ nextflow pull guigolab/grape-nf
Checking guigolab/grape-nf ...
downloaded from https://github.com/guigolab/grape-nf.git
The pipeline has a quick test profile already configured.
Warning
|
Docker is required in order to run the test |
First create a working directory:
$ mkdir test && cd test
Then run the test:
$ nextflow run grape-nf -profile test
N E X T F L O W ~ version 0.17.3
Launching 'guigolab/grape-nf' - revision: a6147f7add [master]
G R A P E ~ RNA-seq Pipeline
General parameters
------------------
Index file : /Users/emilio/.nextflow/assets/guigolab/grape-nf/test-index.txt
Genome : /Users/emilio/.nextflow/assets/guigolab/grape-nf/data/genome.fa
Annotation : /Users/emilio/.nextflow/assets/guigolab/grape-nf/data/annotation.gtf
Pipeline steps : mapping bigwig contig quantification
Mapping parameters
------------------
Tool : GEM 1.7.1
Max mismatches : 4
Max multimaps : 10
Bigwig parameters
-----------------
Tool : RGCRG 0.1
References prefix : chr
Quantification parameters
-------------------------
Tool : FLUX 1.6.1
Mode : Genome
Execution information
---------------------
Engine : local
Use Docker : true
Error strategy : ignore
Dataset information
-------------------
Number of sequenced samples : 1
Number of sequencing runs : 1
Merging : none
===============
Output files db -> /Users/emilio/workspace/grape-nf/pipeline.db
===============
[warm up] executor > local
[09/a54cf9] Submitted process > fastaIndex (genome-SAMtools-0.1.19)
[65/89ba24] Submitted process > index (genome-GEM-1.7.1)
[da/39803f] Submitted process > mapping (test1-GEM-1.7.1)
[81/bcd1cf] Submitted process > inferExp (test1-RSeQC-2.3.9)
[a9/686572] Submitted process > quantification (test1-FLUX-1.6.1)
[67/2783a7] Submitted process > contig (test1-RGCRG-0.1)
[87/c103c9] Submitted process > bigwig (test1-RGCRG-0.1)
-----------------------
Pipeline run completed.
-----------------------
The usage message can be seen with the following command:
$ nextflow run grape-nf --help
N E X T F L O W ~ version 0.17.3
Launching 'guigolab/grape-nf' - revision: a6147f7add [master]
G R A P E ~ RNA-seq Pipeline
----------------------------
Run the GRAPE RNA-seq pipeline on a set of data.
Usage:
grape-pipeline.nf --index INDEX_FILE --genome GENOME_FILE --annotation ANNOTATION_FILE [OPTION]...
Options:
--help Show this message and exit.
--index INDEX_FILE Index file.
--genome GENOME_FILE Reference genome file(s).
--annotation ANNOTAION_FILE Reference gene annotation file(s).
--steps STEP[,STEP]... The steps to be executed within the pipeline run. Possible values: "mapping", "bigwig", "contig", "quantification". Default: all
--max-mismatches THRESHOLD Set maps with more than THRESHOLD error events to unmapped. Default "4".
--max-multimaps THRESHOLD Set multi-maps with more than THRESHOLD mappings to unmapped. Default "10".
--bam-sort METHOD Specify the method used for sorting the genome BAM file.
--add-xs Add the XS field required by Cufflinks/Stringtie to the genome BAM file.
SAM read group options:
--rg-platform PLATFORM Platform/technology used to produce the reads for the BAM @RG tag.
--rg-library LIBRARY Sequencing library name for the BAM @RG tag.
--rg-center-name CENTER_NAME Name of sequencing center that produced the reads for the BAM @RG tag.
--rg-desc DESCRIPTION Description for the BAM @RG tag.
Nextflow provides different Executors
to run processes on the local machine, on a computational grid or the cloud without any change to the actual code.
By default a local executor is used, but it can be changed by using Nextflow executors.
For example, to run the pipeline in a computational cluster using Sun Grid Engine you can set up a nextflow.config
file
in your current working directory with something like:
process {
executor = 'sge'
queue = 'my-queue'
penv = 'smp'
}
The pipeline needs as an input a tab separated file containing containing information about the FASTQ
files to be processed. The
needed columns in order are:
1 |
sample |
the sample identifier, used to merge bam files in case multiple runs for the same sample are present |
2 |
id |
the run identifier (e.g. |
3 |
path |
the path to the fastq file |
4 |
type |
the type (e.g. |
5 |
view |
an attribute that specifies the content of the file (e.g. |
Note
|
Fastq files from paired-end data will be grouped together by id .
|
Here is an example from the test run:
sample1 test1 data/test1_1.fastq.gz fastq FqRd1
sample1 test1 data/test1_2.fastq.gz fastq FqRd2
Sample and id can be the same in case you don’t have/know sample identifiers:
sample1 test1 data/test1_1.fastq.gz fastq FqRd1
sample1 test1 data/test1_2.fastq.gz fastq FqRd2
The default Grape configuration uses Docker to provision the programs needed for the execution. Pre-built Grape containers are publicly available at the Grape page in Docker Hub.
Nextflow also supports Environment Modules. Creating a working configuration for Grape using modules is not yet straightforward. If you need to use modules please contact us directly.
The Grape pipeline can be run using different configuration profiles. The profiles essentially allow the user to run the analyses using
different tools and configurations. To specify a profile you can use the -profiles
Nextflow option.
The following profiles are available at present:
profile | description |
---|---|
gemflux |
uses |
starrsem |
uses |
starflux |
uses |
The default profile uses STAR
and RSEM
and set the --bam-sort
option to samtools
.
To run the pipeline first create a working directory and move there:
$ mkdir grape-pipeline && cd grape-pipeline
Here is a simple example of how you can run the pipeline once you set up it properly:
nextflow -bg run grape-nf --index input-files.tsv --genome refs/hg38.AXYM.fa --annotation refs/gencode.v21.annotation.AXYM.gtf --rg-platform ILLUMINA --rg-center-name CRG -resume > pipeline.log
By default the pipeline execution will stop as far as one of the processes fails. To change this behaviour you can use the [errorStrategy directive](http://www.nextflow.io/docs/latest/process.html#errorstrategy) of Nextflow processes. You can also specify it on command line. For example, to ignore errors and keep processing you can use -process.errorStrategy=ignore
.
It is also possible to run a subset of pipeline steps using the option --steps
. For example, the following command will only run the mapping
and quantification
steps:
nextflow -bg run grape-nf --steps mapping,quantification --index input-files.tsv --genome refs/hg38.AXYM.fa --annotation refs/gencode.v21.annotation.AXYM.gtf --rg-platform ILLUMINA --rg-center-name CRG > pipeline.log
The pipeline compiles a list of output files into the file pipeline.db
inside the current working folder. This format of this file is similar to the input with a few more columns:
1 |
sample |
the sample identifier, used to merge bam files in case multiple runs for the same sample are present |
2 |
id |
the run identifier (e.g. |
3 |
path |
the path to the fastq file |
4 |
type |
the type (e.g. |
5 |
view |
an attribute that specifies the content of the file (e.g. |
6 |
readType |
the input data type (either |
7 |
readStrand |
the inferred exepriment strandednes if any (it can be |
Here is an example from the test run:
sample1 test1 /path/to/results/sample1.contigs.bed bed Contigs Paired-End MATE2_SENSE
sample1 test1 /path/to/results/sample1.isoforms.gtf gtf TranscriptAnnotation Paired-End MATE2_SENSE
sample1 test1 /path/to/results/sample1.plusRaw.bw bigWig PlusRawSignal Paired-End MATE2_SENSE
sample1 test1 /path/to/results/sample1.genes.gff gtf GeneAnnotation Paired-End MATE2_SENSE
sample1 test1 /path/to/results/test1_m4_n10.bam bam GenomeAlignments Paired-End MATE2_SENSE
sample1 test1 /path/to/results/sample1.minusRaw.bw bigWig MinusRawSignal Paired-End MATE2_SENSE
The pipeline can be run without a container engine by installing the required software on the local system.
The required tool version for the standard
profile that have been tested so far are the following: