Setting memory twice when submitting with slurm executor
Opened this issue · 14 comments
Versions
snakemake
version 8.10.7
snakemake-executor-plugin-slurm
version 0.4.4
snakemake-executor-plugin-slurm-jobstep
version 0.2.1
The problem
I am working on getting snakemake
version 8 to work on my slurm server and keep getting the following error:
srun: fatal: SLURM_MEM_PER_CPU, SLURM_MEM_PER_GPU, and SLURM_MEM_PER_NODE are mutually exclusive.
I can see that two resource arguments are being passed when looking at the rule description:
[Fri Apr 19 13:42:50 2024]
rule fastqc:
input: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/raw_data/control_1_R1.fastq.gz, /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/raw_data/co
ntrol_1_R2.fastq.gz
output: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/fastqc_control_1_summary_untrimmed.txt
jobid: 0
reason: Forced execution
wildcards: results=/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results, sample=control_1
resources: mem_mb=15259, mem_mib=7630, disk_mb=43311, disk_mib=41305, tmpdir=<TBD>, slurm_partition=acompile, slurm_account=amc-general, slurm_extra=--output=/pl/active/Anschutz_BDC/analysis/wells/analysis/we
lls/testing_snakemake8_slurm/results/logs/fastqc_pre_trim/fastqc_control_1_untrimmed.out --error=/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/logs/fastqc_pre_trim/fastqc_
control_1_untrimmed.err --qos=compile, runtime=60, mem=16GB
However, I don't know where the mem_mb
is being passed.
Profile
executor: slurm
default-resources:
slurm_partition: "acompile"
slurm_account: "amc-general"
set-resources:
fastqc:
runtime: 60 # 1 hour
mem: "16GB"
fastqc_summary:
runtime: 10
mem: "4GB"
My rule
rule fastqc:
input:
input_list = _get_input
output:
file = "{results}/fastqc_pre_trim/fastqc_{sample}_summary_untrimmed.txt"
params:
output_dir = os.path.join(RESULTS2, "fastqc_pre_trim"),
directories = _get_directories
resources:
slurm_extra=lambda wildcards: (
f"--output={wildcards.results}/logs/fastqc_pre_trim/fastqc_{wildcards.sample}_untrimmed.out "
f"--error={wildcards.results}/logs/fastqc_pre_trim/fastqc_{wildcards.sample}_untrimmed.err "
f"--qos=compile"
)
singularity:
GENERAL_CONTAINER
shell:
"""
mkdir -p {params.output_dir}
fastqc {input} --outdir {params.output_dir}
for dir in {params.directories};
do
name=$(basename -s .zip $dir)
unzip -p $dir $name/summary.txt \
>> {output}
done
"""
my command
snakemake \
--snakefile Snakefile \
--configfile config.yaml \
--jobs 12 \
--latency-wait 60 \
--rerun-incomplete \
--use-singularity \
--workflow-profile profiles/default
Attempted fix 1: use mem_mb
I have also tried this using the mem_mb
argument instead
executor: slurm
default-resources:
slurm_partition: "acompile"
slurm_account: "amc-general"
set-resources:
fastqc:
runtime: 60 # 1 hour
mem_mb: 1600
fastqc_summary:
runtime: 10
mem_mb: 4000
I get the same error, but the double memory request is less obvious
[Fri Apr 19 13:40:42 2024]
rule fastqc:
input: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/raw_data/control_1_R1.fastq.gz, /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/raw_data/co
ntrol_1_R2.fastq.gz
output: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/fastqc_control_1_summary_untrimmed.txt
jobid: 0
reason: Forced execution
wildcards: results=/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results, sample=control_1
resources: mem_mb=1600, mem_mib=1526, disk_mb=43311, disk_mib=41305, tmpdir=<TBD>, slurm_partition=acompile, slurm_account=amc-general, slurm_extra=--output=/pl/active/Anschutz_BDC/analysis/wells/analysis/wel
ls/testing_snakemake8_slurm/results/logs/fastqc_pre_trim/fastqc_control_1_untrimmed.out --error=/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/logs/fastqc_pre_trim/fastqc_c
ontrol_1_untrimmed.err --qos=compile, runtime=60
srun: fatal: SLURM_MEM_PER_CPU, SLURM_MEM_PER_GPU, and SLURM_MEM_PER_NODE are mutually exclusive.
Attempted fix 2 remove profile and specify within rule
I have also tried this where I deleted my profile and just assigned the resources within the rule:
rule fastqc:
input:
input_list = _get_input
output:
file = "{results}/fastqc_pre_trim/fastqc_{sample}_summary_untrimmed.txt"
resources:
job_name="fastqc",
mem_mb=1600,
runtime=60,
slurm_extra=lambda wildcards: (
f"--output={wildcards.results}/logs/fastqc_pre_trim/fastqc_{wildcards.sample}_untrimmed.out "
f"--error={wildcards.results}/logs/fastqc_pre_trim/fastqc_{wildcards.sample}_untrimmed.err "
f"--qos=compile"
)
params:
output_dir = os.path.join(RESULTS2, "fastqc_pre_trim"),
directories = _get_directories
singularity:
GENERAL_CONTAINER
shell:
"""
mkdir -p {params.output_dir}
fastqc {input} --outdir {params.output_dir}
for dir in {params.directories};
do
name=$(basename -s .zip $dir)
unzip -p $dir $name/summary.txt \
>> {output}
done
"""
Submit with:
snakemake \
--snakefile Snakefile \
--configfile config.yaml \
--jobs 12 \
--latency-wait 60 \
--rerun-incomplete \
--use-singularity \
--executor slurm \
--default-resources slurm_account=amc-general slurm_partition=acompile
But this also fails with the same srun error:
srun: fatal: SLURM_MEM_PER_CPU, SLURM_MEM_PER_GPU, and SLURM_MEM_PER_NODE are mutually exclusive.
Attempted fix 3 - submit with sbatch
I've also submitted the job per this issue but that gave the same error as above.
Conclusion
mem_mb
is obviously specified somewhere but I am not sure where to look beyond the profile, rules, and snakemake
command. Do you have any ideas what I may be missing? Thanks so much for your help!
However, I don't know where the mem_mb is being passed.
Your requirement should be translated to mem_mb
and is sufficient. Snakemake merely lists both resources, but that should be fine as it only translates in sbatch --mem ...
. And indeed, within SLURM --mem
and --mem-per-cpu
are mutually exclusive. I will try to track this down. For this, it would be extremely helpful if you run Snakemake with --verbose
and attach the output as a file. Also, please state your SLURM version (output of sinfo --version
). Thank you.
PS would you be interested contributing your workflows to the snakemake-workflows catalogue? see https://snakemake.github.io/snakemake-workflow-catalog/ - some of your look pretty interesting!
Thanks for helping with this!
However, I don't know where the mem_mb is being passed.
Your requirement should be translated to
mem_mb
and is sufficient. Snakemake merely lists both resources, but that should be fine as it only translates insbatch --mem ...
. And indeed, within SLURM--mem
and--mem-per-cpu
are mutually exclusive. I will try to track this down. For this, it would be extremely helpful if you run Snakemake with--verbose
and attach the output as a file. Also, please state your SLURM version (output ofsinfo --version
). Thank you.
-
The slurm version is 23.02.2
-
Here's the output using
--verbose
from the master:
snakemake --snakefile Snakefile --configfile config.yaml --jobs 12 --latency-wait 60 --rerun-incomplete --use-singularity --workflow-profile profiles/default --verbose
Using workflow specific profile profiles/default for setting default command line arguments.
Building DAG of jobs...
shared_storage_local_copies: True
remote_exec: False
SLURM run ID: c846e871-127b-46a2-a3c5-559bfafd7f06
Using shell: /bin/bash
Provided remote nodes: 12
Job stats:
job count
-------------- -------
all 1
fastqc 1
fastqc_summary 1
total 3
Resources before job selection: {'_cores': 9223372036854775807, '_nodes': 12}
Ready jobs (1)
Select jobs to execute...
Using greedy selector because only single job has to be scheduled.
Selected jobs (1)
Resources after job selection: {'_cores': 9223372036854775806, '_nodes': 11}
Execute 1 jobs...
[Mon Apr 22 08:49:54 2024]
rule fastqc:
input: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/raw_data/control_1_R1.fastq.gz, /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testin
g_snakemake8_slurm/raw_data/control_1_R2.fastq.gz
output: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/fastqc_control_1_summary_untrimmed.txt
jobid: 2
reason: Missing output files: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/fastqc_control_1_summary_untrimmed.txt
wildcards: results=/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results, sample=control_1
resources: mem_mb=15259, mem_mib=7630, disk_mb=43311, disk_mib=41305, tmpdir=<TBD>, slurm_partition=acompile, slurm_account=amc-general, slurm_extra=--output=/pl/active/Anschutz_
BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/logs/fastqc_pre_trim/fastqc_control_1_untrimmed.out --error=/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/t
esting_snakemake8_slurm/results/logs/fastqc_pre_trim/fastqc_control_1_untrimmed.err --qos=compile, runtime=60, mem=16GB
General args: ['--force', '--target-files-omit-workdir-adjustment', '--keep-storage-local-copies', '--max-inventory-time 0', '--nocolor', '--notemp', '--no-hooks', '--nolock', '--ign
ore-incomplete', '', '--verbose ', '--rerun-triggers code software-env mtime params input', '', '', '--deployment-method apptainer', '--conda-frontend mamba', '', '', '--apptainer-pr
efix /scratch/alpine/kwellswrasman@xsede.org/apptainer_cache', '', '', '--shared-fs-usage source-cache sources storage-local-copies persistence software-deployment input-output', '',
'--wrapper-prefix https://github.com/snakemake/snakemake-wrappers/raw/', '', '', '--configfiles /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/config
.yaml', '', '', '--latency-wait 60', '--scheduler ilp', '--local-storage-prefix .snakemake/storage', '--scheduler-solver-path /projects/kwellswrasman@xsede.org/software/anaconda/envs
/snakemake8/bin', '', '', '--set-resources base64//ZmFzdHFjOnJ1bnRpbWU9NjA= base64//ZmFzdHFjOm1lbT0xNkdC base64//ZmFzdHFjX3N1bW1hcnk6cnVudGltZT0xMA== base64//ZmFzdHFjX3N1bW1hcnk6bWVt
PTRHQg==', '', '', '--default-resources base64//bWVtX21iPW1pbihtYXgoMippbnB1dC5zaXplX21iLCAxMDAwKSwgODAwMCk= base64//ZGlza19tYj1tYXgoMippbnB1dC5zaXplX21iLCAxMDAwKQ== base64//dG1wZGly
PXN5c3RlbV90bXBkaXI= base64//c2x1cm1fcGFydGl0aW9uPWFjb21waWxl base64//c2x1cm1fYWNjb3VudD1hbWMtZ2VuZXJhbA==', '']
sbatch call: sbatch --job-name c846e871-127b-46a2-a3c5-559bfafd7f06 --output /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/.snakemake/slurm_logs/rule
_fastqc/_/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results_control_1/%j.log --export=ALL --comment fastqc -A amc-general -p acompile -t 60 --mem
15259 --ntasks=1 --cpus-per-task=1 --output=/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/logs/fastqc_pre_trim/fastqc_control_1_untrimmed.out
--error=/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/logs/fastqc_pre_trim/fastqc_control_1_untrimmed.err --qos=compile -D /pl/active/Anschu
tz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm --wrap="/projects/kwellswrasman@xsede.org/software/anaconda/envs/snakemake8/bin/python3.12 -m snakemake --snakefile /pl/
active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/Snakefile --target-jobs 'fastqc:results=/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snake
make8_slurm/results,sample=control_1' --allowed-rules 'fastqc' --cores all --attempt 1 --force-use-threads --resources 'mem_mb=15259' 'mem_mib=7630' 'disk_mb=43311' 'disk_mib=41305'
--wait-for-files '/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/.snakemake/tmp.ch3f265w' '/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/test
ing_snakemake8_slurm/raw_data/control_1_R1.fastq.gz' '/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/raw_data/control_1_R2.fastq.gz' --force --target-
files-omit-workdir-adjustment --keep-storage-local-copies --max-inventory-time 0 --nocolor --notemp --no-hooks --nolock --ignore-incomplete --verbose --rerun-triggers code software-
env mtime params input --deployment-method apptainer --conda-frontend mamba --apptainer-prefix /scratch/alpine/kwellswrasman@xsede.org/apptainer_cache --shared-fs-usage source-cache
sources storage-local-copies persistence software-deployment input-output --wrapper-prefix https://github.com/snakemake/snakemake-wrappers/raw/ --configfiles /pl/active/Anschutz_BDC/
analysis/wells/analysis/wells/testing_snakemake8_slurm/config.yaml --latency-wait 60 --scheduler ilp --local-storage-prefix .snakemake/storage --scheduler-solver-path /projects/kwell
swrasman@xsede.org/software/anaconda/envs/snakemake8/bin --set-resources base64//ZmFzdHFjOnJ1bnRpbWU9NjA= base64//ZmFzdHFjOm1lbT0xNkdC base64//ZmFzdHFjX3N1bW1hcnk6cnVudGltZT0xMA== ba
se64//ZmFzdHFjX3N1bW1hcnk6bWVtPTRHQg== --default-resources base64//bWVtX21iPW1pbihtYXgoMippbnB1dC5zaXplX21iLCAxMDAwKSwgODAwMCk= base64//ZGlza19tYj1tYXgoMippbnB1dC5zaXplX21iLCAxMDAwKQ
== base64//dG1wZGlyPXN5c3RlbV90bXBkaXI= base64//c2x1cm1fcGFydGl0aW9uPWFjb21waWxl base64//c2x1cm1fYWNjb3VudD1hbWMtZ2VuZXJhbA== --executor slurm-jobstep --jobs 1 --mode remote"
Job 2 has been submitted with SLURM jobid 5783166 (log: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/.snakemake/slurm_logs/rule_fastqc/_/pl/active/A
nschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results_control_1/5783166.log).
The job status was queried with command: sacct -X --parsable2 --noheader --format=JobIdRaw,State --starttime 2024-04-20T08:00 --endtime now --name c846e871-127b-46a2-a3c5-559bfafd7f0
6
It took: 0.058480262756347656 seconds
The output is:
'5783166|FAILED
'
status_of_jobs after sacct is: {'5783166': 'FAILED'}
active_jobs_ids_with_current_sacct_status are: {'5783166'}
active_jobs_seen_by_sacct are: {'5783166'}
missing_sacct_status are: set()
[Mon Apr 22 08:50:34 2024]
Error in rule fastqc:
message: SLURM-job '5783166' failed, SLURM status is: 'FAILED'For further error details see the cluster/cloud log and the log files of the involved rule(s).
jobid: 2
input: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/raw_data/control_1_R1.fastq.gz, /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testin
g_snakemake8_slurm/raw_data/control_1_R2.fastq.gz
output: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/fastqc_control_1_summary_untrimmed.txt
log: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/.snakemake/slurm_logs/rule_fastqc/_/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testi
ng_snakemake8_slurm/results_control_1/5783166.log (check log file(s) for error details)
shell:
mkdir -p /scratch/alpine/kwellswrasman@xsede.org/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim
fastqc /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/raw_data/control_1_R1.fastq.gz /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/tes
ting_snakemake8_slurm/raw_data/control_1_R2.fastq.gz --outdir /scratch/alpine/kwellswrasman@xsede.org/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim
for dir in /scratch/alpine/kwellswrasman@xsede.org/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/control_1_R1_fastqc.zip /scratch/alpine/kwel
lswrasman@xsede.org/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/control_1_R2_fastqc.zip;
do
name=$(basename -s .zip $dir)
unzip -p $dir $name/summary.txt >> /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/fastqc_control_1
_summary_untrimmed.txt
done
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
external_jobid: 5783166
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-04-22T084953.851472.snakemake.log
unlocking
removing lock
removing lock
removed all locks
Full Traceback (most recent call last):
File "/projects/kwellswrasman@xsede.org/software/anaconda/envs/snakemake8/lib/python3.12/site-packages/snakemake/cli.py", line 2068, in args_to_api
dag_api.execute_workflow(
File "/projects/kwellswrasman@xsede.org/software/anaconda/envs/snakemake8/lib/python3.12/site-packages/snakemake/api.py", line 589, in execute_workflow
workflow.execute(
File "/projects/kwellswrasman@xsede.org/software/anaconda/envs/snakemake8/lib/python3.12/site-packages/snakemake/workflow.py", line 1285, in execute
raise WorkflowError("At least one job did not complete successfully.")
snakemake_interface_common.exceptions.WorkflowError: At least one job did not complete successfully.
WorkflowError:
At least one job did not complete successfully.
raw_data/control_1_R1.fastq.gz
And the output from the job
Building DAG of jobs...
shared_storage_local_copies: True
remote_exec: True
Using shell: /bin/bash
Provided remote nodes: 1
Provided resources: mem_mb=15259, mem_mib=7630, disk_mb=43311, disk_mib=41305
Resources before job selection: {'mem_mb': 15259, 'mem_mib': 7630, 'disk_mb': 43311, 'disk_mib': 41305, '_cores': 9223372036854775807, '_nodes': 1}
Ready jobs (1)
Select jobs to execute...
Using greedy selector because only single job has to be scheduled.
Selected jobs (1)
Resources after job selection: {'mem_mb': 15259, 'mem_mib': 0, 'disk_mb': 43311, 'disk_mib': 0, '_cores': 9223372036854775806, '_nodes': 0}
Execute 1 jobs...
[Mon Apr 22 08:50:04 2024]
rule fastqc:
input: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/raw_data/control_1_R1.fastq.gz, /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testin
g_snakemake8_slurm/raw_data/control_1_R2.fastq.gz
output: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/fastqc_control_1_summary_untrimmed.txt
jobid: 0
reason: Forced execution
wildcards: results=/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results, sample=control_1
resources: mem_mb=15259, mem_mib=7630, disk_mb=43311, disk_mib=41305, tmpdir=<TBD>, slurm_partition=acompile, slurm_account=amc-general, slurm_extra=--output=/pl/active/Anschutz_
BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/logs/fastqc_pre_trim/fastqc_control_1_untrimmed.out --error=/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/t
esting_snakemake8_slurm/results/logs/fastqc_pre_trim/fastqc_control_1_untrimmed.err --qos=compile, runtime=60, mem=16GB
General args: ['--force', '--target-files-omit-workdir-adjustment', '--keep-storage-local-copies', '--max-inventory-time 0', '--nocolor', '--notemp', '--no-hooks', '--nolock', '--ign
ore-incomplete', '', '--verbose ', '--rerun-triggers code mtime input params software-env', '', '', '--deployment-method apptainer', '--conda-frontend mamba', '', '', '--apptainer-pr
efix /scratch/alpine/kwellswrasman@xsede.org/apptainer_cache', '', '', '--shared-fs-usage sources source-cache software-deployment input-output persistence storage-local-copies', '',
'--wrapper-prefix https://github.com/snakemake/snakemake-wrappers/raw/', '', '', '--configfiles /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/config
.yaml', '', '', '--latency-wait 60', '--scheduler ilp', '--local-storage-prefix .snakemake/storage', '--scheduler-solver-path /projects/kwellswrasman@xsede.org/software/anaconda/envs
/snakemake8/bin', '', '', '--set-resources base64//ZmFzdHFjOnJ1bnRpbWU9NjA= base64//ZmFzdHFjOm1lbT0xNkdC base64//ZmFzdHFjX3N1bW1hcnk6cnVudGltZT0xMA== base64//ZmFzdHFjX3N1bW1hcnk6bWVt
PTRHQg==', '', '', '--default-resources base64//bWVtX21iPW1pbihtYXgoMippbnB1dC5zaXplX21iLCAxMDAwKSwgODAwMCk= base64//ZGlza19tYj1tYXgoMippbnB1dC5zaXplX21iLCAxMDAwKQ== base64//dG1wZGly
PXN5c3RlbV90bXBkaXI= base64//c2x1cm1fcGFydGl0aW9uPWFjb21waWxl base64//c2x1cm1fYWNjb3VudD1hbWMtZ2VuZXJhbA==']
This job is a group job: False
The call for this job is: srun -n1 --cpu-bind=q --cpus-per-task 1 /projects/kwellswrasman@xsede.org/software/anaconda/envs/snakemake8/bin/python3.12 -m snakemake --snakefile /pl/acti
ve/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/Snakefile --target-jobs 'fastqc:results=/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake
8_slurm/results,sample=control_1' --allowed-rules 'fastqc' --cores all --attempt 1 --force-use-threads --resources 'mem_mb=15259' 'mem_mib=7630' 'disk_mb=43311' 'disk_mib=41305' --f
orce --target-files-omit-workdir-adjustment --keep-storage-local-copies --max-inventory-time 0 --nocolor --notemp --no-hooks --nolock --ignore-incomplete --verbose --rerun-triggers
code mtime input params software-env --deployment-method apptainer --conda-frontend mamba --apptainer-prefix /scratch/alpine/kwellswrasman@xsede.org/apptainer_cache --shared-fs-usage
sources source-cache software-deployment input-output persistence storage-local-copies --wrapper-prefix https://github.com/snakemake/snakemake-wrappers/raw/ --configfiles /pl/active
/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/config.yaml --latency-wait 60 --scheduler ilp --local-storage-prefix .snakemake/storage --scheduler-solver-path /
projects/kwellswrasman@xsede.org/software/anaconda/envs/snakemake8/bin --set-resources base64//ZmFzdHFjOnJ1bnRpbWU9NjA= base64//ZmFzdHFjOm1lbT0xNkdC base64//ZmFzdHFjX3N1bW1hcnk6cnVud
GltZT0xMA== base64//ZmFzdHFjX3N1bW1hcnk6bWVtPTRHQg== --default-resources base64//bWVtX21iPW1pbihtYXgoMippbnB1dC5zaXplX21iLCAxMDAwKSwgODAwMCk= base64//ZGlza19tYj1tYXgoMippbnB1dC5zaXpl
X21iLCAxMDAwKQ== base64//dG1wZGlyPXN5c3RlbV90bXBkaXI= base64//c2x1cm1fcGFydGl0aW9uPWFjb21waWxl base64//c2x1cm1fYWNjb3VudD1hbWMtZ2VuZXJhbA== --mode remote
Job is running on host: c3cpu-a2-u32-1.rc.int.colorado.edu
srun: fatal: SLURM_MEM_PER_CPU, SLURM_MEM_PER_GPU, and SLURM_MEM_PER_NODE are mutually exclusive.
[Mon Apr 22 08:50:04 2024]
Error in rule fastqc:
jobid: 0
input: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/raw_data/control_1_R1.fastq.gz, /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testin
g_snakemake8_slurm/raw_data/control_1_R2.fastq.gz
output: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/fastqc_control_1_summary_untrimmed.txt
shell:
mkdir -p /scratch/alpine/kwellswrasman@xsede.org/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim
fastqc /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/raw_data/control_1_R1.fastq.gz /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/tes
ting_snakemake8_slurm/raw_data/control_1_R2.fastq.gz --outdir /scratch/alpine/kwellswrasman@xsede.org/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim
for dir in /scratch/alpine/kwellswrasman@xsede.org/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/control_1_R1_fastqc.zip /scratch/alpine/kwel
lswrasman@xsede.org/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/control_1_R2_fastqc.zip;
do
name=$(basename -s .zip $dir)
unzip -p $dir $name/summary.txt >> /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/fastqc_control_1
_summary_untrimmed.txt
done
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Storing output in storage.
Full Traceback (most recent call last):
File "/projects/kwellswrasman@xsede.org/software/anaconda/envs/snakemake8/lib/python3.12/site-packages/snakemake/cli.py", line 2068, in args_to_api
dag_api.execute_workflow(
File "/projects/kwellswrasman@xsede.org/software/anaconda/envs/snakemake8/lib/python3.12/site-packages/snakemake/api.py", line 589, in execute_workflow
workflow.execute(
File "/projects/kwellswrasman@xsede.org/software/anaconda/envs/snakemake8/lib/python3.12/site-packages/snakemake/workflow.py", line 1285, in execute
raise WorkflowError("At least one job did not complete successfully.")
snakemake_interface_common.exceptions.WorkflowError: At least one job did not complete successfully.
WorkflowError:
At least one job did not complete successfully.
PS would you be interested contributing your workflows to the snakemake-workflows catalogue? see https://snakemake.github.io/snakemake-workflow-catalog/ - some of your look pretty interesting!
I could definitely do that! I'll add it to my todo list!
Bad news: I cannot reproduce this behaviour. edit: My SLURM version is 23.02.7.
I noticed that you are overwriting of --output
by slurm_extra
. It does not produce the error. Yet, we have our log file at os.path.abspath(f".snakemake/slurm_logs/{group_or_rule}/{wildcard_str}/%j.log")
as gets reported by the plugin.
My Snakefile is
rule all:
input: "results/2.out"
rule test1:
output: "results/2.out"
#threads: 2
resources:
cpus_per_task=2,
slurm_extra="--output='somewhere_%j.log'"
shell: "touch results/$SLURM_CPUS_PER_TASK.out"
My profile:
default-resources:
slurm_partition: "smallcpu"
slurm_account: "nhr-zdvhpc" #"m2_zdvhpc"
set-resources:
test1:
runtime: 5
mem_mb: 1800
Does this produce the observed error, too?
That's unfortunate that you can't reproduce it.
You are completely correct, using your profile (changing the partition and account) and your Snakefile
I get the same error. But the error still occurs when I remove the slurm_extra
argument so it doesn't seem to be coming from overwriting --output
.
... I get the same error.
That is not what I wanted to read ;-)
Assuming you have this script:
#!/bin/bash
#SBATCH --mem 100
#SBATCH -A amc-general
#SBATCH -p acompile
#SBATCH -t 5
srun echo "Hello world"
and you run sbatch <this script>
. Does your SLURM output contain the error, too? I mean, we observe the call to srun
in the jobstep executor to NOT include any memory setting, and weirdly you still see this error.
You are good, that produced the exact same error. Seems to be an issue with my system and not snakemake
(probably what you did want to hear!)
I'll reach out to our system administrators. Thank you so much for all of your help!
probably what you did want to hear
Not really. It is some sort of relief, though. I know that it takes effort to update SLURM, if my colleagues are bitten by a bug — but then again, I would be surprised if you are the first to report.
Thanks for the feedback. I will keep this issue open, if you don't mind, and await further feedback. Perhaps, it turns out to be a corner case, we can mitigate.
Sounds great, we are working on it and have so far figured out that this works
#!/bin/bash
#SBATCH --mem 100
#SBATCH -A amc-general
#SBATCH -p acompile
#SBATCH -t 5
srun --mem 100 echo "Hello world"
I will let you know if we make any progress.
urgh, is redundancy a new hobby of SchedMD or is there a technical reason behind it (just a rhetorical question!)? I need to check a couple (read: two, for I do not have more and ask colleagues to do the same) of SLURM versions when I contribute the duplication into the code. I am not sure whether or where there might be side effects.
Also, as “my” most current version of SLURM is slightly more up to date than yours, I have to presume, that this is a quirk of your cluster.
This is likely a quirk of my cluster. We will definitely keep working on our side to see if there are good fixes.
Again, thanks so much for your help!
I might have found the problem... Our cluster is currently going through some growing pains so the best way to get an interactive job is by staring an interactive vscode session. When I submit the
snakemakejobs from within the interactive
vscode` session I get the error, but I don't when submitting from a normal interactive node.
So the slurm integration seems to work well as long as I'm not running through vscode
.
Ah, the issue is that you submit whilst working within job context. I'm afraid, that's not what we designed the plugin for. It should not be an issue either, at least that issue of yours should not arise.
Now, we can certainly detect this and program a fat warning. I wonder, however, whether falling back on the actual SLURM executor instead of the jobstep executor is possible as a reaction. Either way, I will keep this issue open until I have an answer to this question.