FR-AgENCODE RNA-seq pipeline
This is the pipeline to process RNA-seq data from raw reads to reference and novel gene and transcript expression used in the FR-AgENCODE project http://www.fragencode.org/. It can be used in the context of other projects but data need to be paired-end and stranded (type mate2_sense). Also, provided pairs of fastq need to correspond to bioreplicates and not to technical replicates or runs.
Follow the instructions below to use this pipeline on genologin.
Prerequisites
Conda and Snakemake are required to install and run this pipeline.
-
Load modules
module purge module load system/Anaconda3-5.2.0 module load bioinfo/snakemake-4.8.0
-
Enable the conda activate command
source /usr/local/bioinfo/src/Anaconda/Anaconda3-5.2.0/etc/profile.d/conda.sh
Installation
-
Download the code
git clone https://github.com/sdjebali/fragencode-rnaseq-mapquantassemble.git
-
(Recommended) Specify the conda environments and packages paths if you want them elsewhere than ~/.conda
conda config --add envs_dirs /path/to/conda/envs conda config --add pkgs_dirs /path/to/conda/pkgs
-
Create the environment in the conda environments path
cd fragencode-rnaseq-mapquantassemble conda env create --file environment.yaml
Usage
-
Configure the samples.tsv, reads.tsv and config.yaml files
-
Activate the conda environment
conda activate mapquantassemble
-
Export the DRMAA library path
export DRMAA_LIBRARY_PATH="/tools/libraries/slurm-drmaa/slurm-drmaa-1.0.7/lib/libdrmaa.so"
-
Run the pipeline
cd fragencode-rnaseq-mapquantassemble snakemake --use-conda --conda-prefix /path/to/conda/envs --debug-dag --jobs 30 --cluster-config cluster.yaml --drmaa " --mem-per-cpu={cluster.mem}000 --mincpus={threads} --time={cluster.time} -J {cluster.name} -N 1=1" --configfile config.yaml -p