A Test Case for Running Snakemake on ISCA
This git represents a simple test case to show how to get snakemake to play nice with ISCA. It consists of 4 important files:
Snakefile
- This is the file that contains the rules that are to be runcluster.json
- This describes the parameters for each rule and how they are to be run on the serversconfig.yaml
- This contains the "samples". The snakefile will run each rule for each sample.custom_jobscript.sh
- This is a hack file to make it play nice with ISCA.
The reason you need custom_jobscript.sh
is that otherwise, the created jobs don't know where conda
is, so they can't run the programs you want. You are supposed to be able to use the flag --use-conda
to get snakemake to create an appropriate environment from the envs/test.yaml
file assigned to a rule. But it doesn't work.
The downside is that you will need to change the custom_jobscript.sh
to point at your conda
install, and on the next line call the conda activate
on the appropriate environment. Once that's done, it works.
You can then get it to submit jobs to the server by running:
snakemake --jobs 100 \
--jobscript custom_jobscript.sh \
--cluster-config cluster.json \
--cluster "msub -V -l walltime={cluster.walltime},nodes=1:ppn={cluster.threads} -q {cluster.queue} -A {cluster.project} -j oe"
Snakemake will then figure out that for each of the 5 samples specified in the config.yaml
, the test_conda
rule is needed to generate the outputs specified in the all
rule. Because there is a dependency between the test_conda
rule and the setup
rule (because the input for the former is the output for the latter), it also figures out it needs to run setup
for each sample.
The jobs
parameter in the above command determines how many active jobs it can have. As there are 5 samples and 2 jobs, it will submit the task defined in setup
for the first two jobs to the single queue (as specfied in cluster.json
). Once these jobs complete, it will then submit the test_conda
job to the parallel queue for the first two samples.
Once the first two samples are complete, it will move on to the next two and repeat. If you changed the --jobs
flag to 5
, it would run setup
for all jobs at once.
You'll notice on line 22 of the Snakefile
, you can specify which rules you want to run locally, and which ones you want to turn into cluster jobs. If you uncomment this line, the setup
rule will be run locally and only the test_conda
rule will create a cluster job.
It's a pretty cool tool for creating reproducible computational pipelines.
Test Case
For this test case, we will use Snakemake to analyse the viral content of the Biller et al metagenomes.
This assumes the following is installed:
iCommands
for downloading files from iMicrobes.VirSorter
as described here