Snakemake wrapper for monorail.
How to use:
- Remove all
module load singularity
lines from Snakerail unless your HPC also uses a module system
- Then install singularity with conda
git clone https://github.com/davemcg/Snakerail.git
git clone https://github.com/langmead-lab/monorail-external
- Install reference info and images using monorail scripts:
cd /path/to/ref/folder
bash ~/path/monorail-external/get_unify_refs.sh
bash ~/path/monorail-external/get_human_ref_indexes.sh
singularity pull docker://quay.io/broadsword/recount-unify:1.1.0
singularity pull docker://quay.io/benlangmead/recount-rs5:1.0.6
- ONLY IF YOU HAVE SINGLE END FILES: copy the
run_recount_pump_single.sh
insrc
of this repo to thesingularity
folder in wherever you cloned the monorail repo. For example:cp ~/path/to/Snakerail/src/run_recount_pump_single.sh ~/path/monorail-external/singularity/
- Copy and edit the yaml to your working dir
- Create file metadata file in tsv format (used in yaml as
study_fq
). Example here - Run (SPECIFIC TO NIH HPC)
bash /path/to/repo/Snakerail/Snakerail/Snakerail.wrapper.sh snakerail_config.yaml
- a bit more generically, you could run something like
snakemake -s /path/to/this/repo/Snakerail --configfile snakerail_config.yaml
Yes, but the pump
and unify
steps are (at least for me) a bit fiddly to keep track of the individual steps. So this wraps the whole thing in one Snakefile. Essentially you start with a metadata tsv (first col is study, second col is fastq prefix, and third col is single
or paired
to denote how the sequencing was done) and your fastq files in a folder. It runs pump
, then moves them all into a folder for unify
. After unify
finishes, it munges the unify output into a RSE for direct use in recount3
Because this is for my working use on NIH HPC, which uses a module system which I abuse instead of rolling my own containers or conda envs or something. If you do want to run this and are having trouble, let me know. I don't think it's much more effort to make more general. Again, I'm lazy and don't want to optimize further if only I am using it.
Monorail has a bug (?) where the script they provide to run it assumes, for a local run, that it is paired end. This script just tweaks it lightly to take out the second fq file and move up the study name by one