Memory limits
azzaea opened this issue · 2 comments
@jacobrh91 : With novosort, I'm running into the error: slurmstepd: Exceeded step memory limit at some point.
. I think this relates to the way I'm setting up the parameter: NOVOSORT_MEMLIMIT=3000000000
in the runfile.
Is there a recommendation as to how much memory should be reserved for novosort? I mean, should I specify it according to the maximum memory limit of the cluster for example, or do I specifically reserve it in the job submission script?
Thank you,
Azza
Hey, as far as novosort goes, I usually set it to 20GB, but that's because I run on a machine with a lot of RAM (256GB). If running multiple programs on the same node (the PROGRRAMS_PER_NODE flag set greater than 1), be sure to take that into account when choosing the novosort mem limit.
If running on something with 64GB of RAM, if PROGRAMS_PER_NODE=2, then I would set NOVOSORT_MEMLIMIT to something like 25GB, so at most two novosort programs side-by-side would only use up to 50GB of memory, leaving plenty of room for Swift/T and the OS itself to run.
Does this explanation help?
Yes, it makes sense.
It suggests that I'm in trouble because the nodes I'm accessing are shared, and the scheduler sets limits on how much memory I'm using. I will try to explicitly reserve memory in the Swift/T settings.sh file, while keeping these calculations and rationale in mind.