Provided cores differ to what's passed in the CLI
Opened this issue · 1 comments
Hi,
first, thank you for providing this profile. It's extremely useful.
I'm having a problem to launch multiple jobs at the same time. For example, I want to launch 5 jobs at the same time, each with 64 cores.
If I run snakemake --cores 64
, I find that the jobs get launched sequentially rather than in parallel. I understand that this is because I requested a "maximum" of 64 cores, and thus if a job takes that up, I can only run one at a time.
Now, I wrote a function that is passed to the rules threads
directive which multiples the workflow.cores
by, say, 0.2
. So I can pass snakemake --cores 320
and each rule will be allocated 64 cores. However, I am finding that somehow this is getting "squared". What happens is:
- the Snakemake STDOUT (what is shown in the screen) shows the correct number of threads:
rule map_reads:
input: output/mapping/H/catalogue.mmi, output/qc/merged/H_S003_R1.fq.gz, output/qc/merged/H_S003_R2.fq.gz
output: output/mapping/bam/H/H_S003.map.bam
log: output/logs/mapping/map_reads/H-H_S003.log
jobid: 247
benchmark: output/benchmarks/mapping/map_reads/H-H_S003.txt
reason: Missing output files: output/mapping/bam/H/H_S003.map.bam
wildcards: binning_group=H, sample=H_S003
threads: 64
resources: tmpdir=/tmp, mem_mb=149952
minimap2 -t 64 > output/mapping/bam/H/H_S003.map.bam
Submitted job 247 with external jobid '35894421'.
That looks fine. I want this rule to be launched with 64 cores, and when I do this, 5 instances of the rule get launched at the same time.
When I open the job's SLURM log, however, I find that this value of 64 is passed as the "Provided cores" to the job, and thus is multiplied again by 0.2
.
Contents of slurm-35894421.out
:
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 64
Rules claiming more threads will be scaled down.
Select jobs to execute...
rule map_reads:
input: output/mapping/H/catalogue.mmi, output/qc/merged/H_S003_R1.fq.gz, output/qc/merged/H_S003_R2.fq.gz
output: output/mapping/bam/H/H_S003.map.bam
log: output/logs/mapping/map_reads/H-H_S003.log
jobid: 247
benchmark: output/benchmarks/mapping/map_reads/H-H_S003.txt
reason: Missing output files: output/mapping/bam/H/H_S003.map.bam
wildcards: binning_group=H, sample=H_S003
threads: 13
resources: tmpdir=/tmp, mem_mb=149952
minimap2 -t 13 > output/mapping/bam/H/H_S003.map.bam
Even worse, my job is allocating 64 cores, but only using 13 (64 * 0.2
, rounded). It's really weird to me that the Snakemake output shows the "correct" value, but the SLURM log shows the "real" value that was used, i.e. why do they differ?
I am trying to understand what am I doing wrong. Setting a breakpoint on my function used to get the number of threads, the workflow.cores
variable is always what I pass to the command line (320), never what shows in the SLURM log.
I tried add a nodes: 5
or a jobs: 5
keys to the profile config.yaml
but it doesn't do any good. Is there anything I can modify in the profile to make sure that I can launch as many parallel jobs as I can?
Please let me know what other information I can provide. Thank you very much.
Best,
V
Update:
I found a temporary fix by using Snakemake's max-threads
option. However, I am still trying to understand what would be the best practice for my scenario, and why is my function call "doubled", that is, why does the SLURM log differs from the Snakemake output by showing the "Provided cores" as the number of threads passed to the rule (and the number of threads gets calculated again from the Provided cores). Thank you!