pestat -G does not work anymore
Closed this issue · 14 comments
The GRES information is obtained from the sinfo command. Which Slurm version are you using?
Please check the output of the sinfo command (as it is used by pestat):
sinfo -N -o "%N %P %C %O %m %e %t %Z %G" -p gpu-long
Update: I see that the per-job GRES/job is missing, while the GRES/node is OK.
It seems that the squeue command option to print the job GRES is undocumented.
I have opened Slurm bug https://bugs.schedmd.com/show_bug.cgi?id=11239 about this.
Can you please download the latest pestat version and test that it prints GRES/job correctly?
Hello,
I'll add to this issue because I've seen a similar problem with the -G flag, where it will print the GRES/node correctly but GRES/job is displayed as "N/A".
If I manually change pestat's squeue -O call from "tres-per-node" to "tres-per-job", it displays correctly (e.g. as "gpu[:type]:4").
Might be that tres-per-node not being populated in squeue output is only a problem when allowing non-exclusive jobs (i.e. SelectType=select/cons_tres, SelectTypeParameters=CR_Core_Memory)?
I should note we're still on SLURM version 20.02.5, so if tres-per-node/tres-per-job output behaves differently in newer versions, please disregard.
I made a test of this now, showing that tres-per-node works and tres-per-job doesn't :
$ squeue -j 4303832 -O 'State: ,JobID: ,tres-per-node: ,NodeList: '
STATE JOBID TRES_PER_NODE NODELIST
RUNNING 4303832 gpu:RTX3090:8 s006
$ squeue -j 4303832 -O 'State: ,JobID: ,tres-per-job: ,NodeList: '
STATE JOBID TRES_PER_JOB NODELIST
RUNNING 4303832 N/A s006
This job is a non-exclusive job. We run 20.11.8.
Strange! Exactly the opposite for me:
$ squeue -j 23221 -O'State: ,JobID: ,tres-per-node: ,NodeList:'
STATE JOBID TRES_PER_NODE NODELIST
RUNNING 23221 N/A gpunode03
$ squeue -j 23221 -O'State: ,JobID: ,tres-per-job: ,NodeList:'
STATE JOBID TRES_PER_JOB NODELIST
RUNNING 23221 gpu:tesla:1 gpunode03
The only other reason I could think of besides SLURM version changes from 20.02 to 20.11 is that we use the OpenHPC package versions (ohpc-slurm-*), which might be slightly modified compared to the "main branch"; I will check this issue again whenever we get to do an update.
For the time being, I'll just modify my private version of pestat to use tres-per-job - anyway, thank you for providing these tools, in my opinion they are incredibly helpful for managing a SLURM cluster!
It would seem that squeue's tres-per-* have changed from 20.02 to 20.11. I've opened a bug with SchedMD support to find out why 20.11 behaves as shown above, see https://bugs.schedmd.com/show_bug.cgi?id=13007
IMHO, OpenHPC is always behind the Slurm 9-month release schedule, and your 20.02 is no longer supported (including security fixes).
SchedMD has clarified the confusion in https://bugs.schedmd.com/show_bug.cgi?id=13007.
It turns out that squeue's tres-per-* output reflects what the user requested in his job, so you can't know which tres-per-* field to use :-( Maybe there exists a more general way?
In bug 13007 SchedMD recommends to use "tres-alloc" to get all allocated resources.
Good to know! The explanation for the tres-per-* makes sense... tres-alloc seems the correct choice then, although it makes the pestat output rather noisy without further filtering. Maybe I'll make some modifications myself to try and find some concise way to display that info - if I find the time and manage to get out something useable, I'll let you know.
I've updated the pestat tool in GitHub now so that it parses the tres-alloc field and extracts the gres/gpu: variable.
Could you please test if the new version works correctly on your cluster?
Do you have any comments on the formatting of pestat's output?
Thanks,
Ole
Hi,
it generally seems to work - a small nitpick is that the filtering doesn't catch "generic" gpu reservations, i.e. those with just "--gpus=" instead of "--gpus=type:", as those only show up in tres-alloc as "gres/gpu=" and get filtered out when searching for "gres/gpu:" (with colon).
Since the "gres/gpu=" string shows up both when the type is specified an when it isn't, IMO there would be two ways to handle this:
- Only display gres/gpu=, omit entries for the type
- Filter for both occurences, throw away the "generic" entry if we find a specific type
The first one throws away some information (on a cluster with several GPU types, it might be interesting which of these are explicitly requested), while the second one complicates the code a little bit further - although as long as we can assume that the gres/gpu: entry always follows later in the tres-alloc string than the generic gres/gpu= entry, it should be quite straightforward to just overwrite the GRES var, something like:
[...]
if (index(treslist[i], "gres/gpu=") > 0) {
# Omit the "gres/gpu=" string and start at char 10:
GRES = "gpu:" substr(treslist[i],10)
}
if (index(treslist[i], "gres/gpu:") > 0) {
# Omit the "gres/gpu:" string and start at char 10, overwrite generic entry:
GRES = substr(treslist[i],10)
continue
}
[...]
Thanks a lot for your comments! I didn't know the distinction between "gres/gpu=" and "gres/gpu:". I've added something like your suggestion, but now start the string at char 6. Does that look good?
Hello,
yes, looks great! As far as I can tell, the GRES output is now correctly displayed for all entries.
Thanks a lot for you testing and feedback!
I will post a message about the updates to the slurm-users mailing list.