PySlurm/pyslurm

Particular value of gres is causing cstr to crash

robgics opened this issue · 1 comments

Details

  • Slurm Version:23.02.5
  • Python Version:3.6.8
  • Cython Version:3.0.0
  • PySlurm Branch:main
  • Linux Distribution:RHEL8.8

Issue

pyslurm was crashing when trying to get the value for gres_per_node from a job. I was able to track down the job, and it was created with this salloc command:

salloc -N 1 --gres=gpu

This works because slurm says the type and count are optional. If not specified, the count is a default value of 1.

However, the to_gres_dict function in cstr assumes that if the gres string isn't null, then it will have a string that contains ":". As a result, when it goes to access the splittled string, this error happens:

Traceback (most recent call last):
File "./slurm_jobs_to_graphite.py", line 283, in
get_data()
File "./slurm_jobs_to_graphite.py", line 118, in get_data
tres = job.gres_per_node
File "pyslurm/core/job/job.pyx", line 1141, in pyslurm.core.job.job.Job.gres_per_node.get
File "pyslurm/utils/cstr.pyx", line 229, in pyslurm.utils.cstr.to_gres_dict
IndexError: list index out of range

The line numbers might be a little off there, as I added some code for debugging....the line in cstr is this:

name, typ, cnt = gres_splitted[0], gres_splitted[1], 0

I printed out gres_splitted right before this line, and it had the value: ['gpu'] hence the index out of range.

Hi @robgics

thanks for reporting. It should be fixed now with #334 merged into main