MPCDF Nodes (Raven) memory logic needs to be improved for shared nodes

Question

MPCDF Nodes (Raven) memory logic needs to be improved for shared nodes

jfy133 opened this issue 2 years ago · 4 comments

I was running nf-core/mag with -profile mpcdf,raven,test_full and was getting failed to execute to grid schedular.

Manually submitting the failed job resulted in

sbatch: error: job memory limit for shared nodes exceeded. Must be <= 120000 MB
sbatch: error: Batch job submission failed: Invalid feature specification

We need to play with settings to correctly set this.

Answer 1 · 2022-06-06T04:48:22.000Z

Basically need to ensure if jobs go above the shared node configuration, it switches to the minimum requirements for the exclusive nodes.

i.e., you go above

    shared    cpu     36 / 72  in HT mode                     120 GB          < 1       24:00:00

¹
either 36 cores OR 120 Gb, you must go to a MINIMUM

    exclusive cpu     72 / 144 in HT mode                     240 GB         1-360      24:00:00

¹
of 72 cores AND 240 GB memory

https://docs.mpcdf.mpg.de/doc/computing/raven-user-guide.html?highlight=ptmp#slurm-batch-system ↩ ↩²

Answer 2 · 2022-06-06T04:51:04.000Z

Also seems to be a maxFork limit of 16 jobs on exclusive nodes

Answer 3 · 2023-03-01T13:15:29.000Z

Have you tried: -profile test_full,mpcdf,raven?

Answer 4 · 2023-03-02T09:54:30.000Z

Yes also didn't work, but I've been running some stuff with taxprofiler recently without any issue, so I think we can close this for now :)

Footnotes