reframe-hpc/reframe

Query for the pending job reason using `squeue` may fail and erroneously report the test job as a failure

Closed this issue · 0 comments

Here's the test's stderr:

  * Reason: spawned process error: command 'squeue -h -j 12962 -o %r' failed with exit code 1:
--- stdout ---
--- stdout ---
--- stderr ---
slurm_load_jobs error: Invalid job id specified
--- stderr ---

In the past, squeue didn't fail if the passed job id didn't exist. The error comes from this part of the code:

completed = _run_strict('squeue -h -j %s -o %%r' % job.jobid)

We should also ignore squeue's failure and assume that job has finished.