nf-core/configs

MPCDF Nodes (Raven) memory logic needs to be improved for shared nodes

jfy133 opened this issue · 4 comments

I was running nf-core/mag with -profile mpcdf,raven,test_full and was getting failed to execute to grid schedular.

Manually submitting the failed job resulted in

sbatch: error: job memory limit for shared nodes exceeded. Must be <= 120000 MB
sbatch: error: Batch job submission failed: Invalid feature specification

We need to play with settings to correctly set this.

Basically need to ensure if jobs go above the shared node configuration, it switches to the minimum requirements for the exclusive nodes.

i.e., you go above

    shared    cpu     36 / 72  in HT mode                     120 GB          < 1       24:00:00

1
either 36 cores OR 120 Gb, you must go to a MINIMUM

    exclusive cpu     72 / 144 in HT mode                     240 GB         1-360      24:00:00

1
of 72 cores AND 240 GB memory

Footnotes

  1. https://docs.mpcdf.mpg.de/doc/computing/raven-user-guide.html?highlight=ptmp#slurm-batch-system 2

Also seems to be a maxFork limit of 16 jobs on exclusive nodes

Have you tried: -profile test_full,mpcdf,raven?

Yes also didn't work, but I've been running some stuff with taxprofiler recently without any issue, so I think we can close this for now :)