MPCDF Nodes (Raven) memory logic needs to be improved for shared nodes
jfy133 opened this issue · 4 comments
jfy133 commented
I was running nf-core/mag with -profile mpcdf,raven,test_full
and was getting failed to execute to grid schedular.
Manually submitting the failed job resulted in
sbatch: error: job memory limit for shared nodes exceeded. Must be <= 120000 MB
sbatch: error: Batch job submission failed: Invalid feature specification
We need to play with settings to correctly set this.
jfy133 commented
Basically need to ensure if jobs go above the shared
node configuration, it switches to the minimum requirements for the exclusive nodes.
i.e., you go above
shared cpu 36 / 72 in HT mode 120 GB < 1 24:00:00
1
either 36 cores OR 120 Gb, you must go to a MINIMUM
exclusive cpu 72 / 144 in HT mode 240 GB 1-360 24:00:00
1
of 72 cores AND 240 GB memory
Footnotes
jfy133 commented
Also seems to be a maxFork
limit of 16 jobs on exclusive nodes
maxulysse commented
Have you tried: -profile test_full,mpcdf,raven
?
jfy133 commented
Yes also didn't work, but I've been running some stuff with taxprofiler recently without any issue, so I think we can close this for now :)