Automated workflow for small RNA sequence data

Snakemake workflow for processing small RNA-seq libaries produced by Illumina small sequencing kits.

Requirments

demultiplex fastq files in located in data directory. They need to be in the form {sample}_R1.fastq.gz
Snakefile shipped with this repository.
config.yaml shipped with this repository. It contains all parameters and settings to customize the processing of the current dataset.
samples.csv listing all samples in the data directory withoug the _R1.fastq.gz suffix. The first line is the header i.e. the work library. An example is shipped with this repository which can be used as a template.
Optionall: environment.yaml to create the software environment if conda is used.
Installation of snakemake and optionally conda
If conda is not used, bowtie, fastqc, samtools and deeptools need to be in the PATH.

The above files can be downloaded as a whole by cloning the repository (which requires git):

git clone https://github.com/seb-mueller/snakemake_sRNAseq.git

Or individually for example the Snakemake file using wget:

wget https://raw.githubusercontent.com/seb-mueller/snakemake_sRNAseq/master/Snakefile

creating conda environment

conda env create --file environment.yaml --name srna_mapping

activate

conda activate srna_mapping

To deactivate the environment, run:

conda deactivate

Update:

git pull
conda env update --file environment.yaml --name srna_mapping

Usage:

Navigate in a Unix shell to the base directory contains the files listed above plus the data directory including the data like int this example:

.
├── data
│   ├── test2_R1.fastq.gz
│   └── test3_R1.fastq.gz
├── config.yaml
├── environment.yaml
├── samples.csv
└── Snakefile

Then just run snakmake in base directory:

# the most basic usage
snakemake
# recommended: automatic conda managment in central location
snakemake --use-conda --conda-prefix ~/.myconda -p

useful parameters:

--cores max number of threads
-n dryrun
-p print commands
--use-conda
--conda-prefix ~/.myconda
--forcerun postmapping forces rerun of a given rule (e.g. postmapping)
--keep-going if for example one sample fails, pipeline will still try to process other samples

Output:

trimmed, log and mapped directory with trimming and mapping results.

Update: added STAR support

# create star index (goes in staridx folder)
snakemake -p --skip-script-cleanup staridx --cores 3
# then map using star
snakemake -p --skip-script-cleanup starmap --cores 3
# TODO: create bw files form STAR mapping