scarlehoff/pyHepGrid

Resubmit failed jobs

Opened this issue · 2 comments

Would it be possible to implement the option of resubmitting jobs with status FAILED? Sometimes (not very often ;-) ) jobs fail because of things unrelated to the job scripts (but because of a failure of file transfers etc). In that case it would be nice to be able to resubmit the jobs which failed - so e.g. the, 7 "subjobs" (in ganga-speak) of job N which failed. It could be an option
--resubmit_failed -j N

Hmmm, we already have a --resubmit flag (although it's for warmups only at the moment). I suspect the associated logic [programs.py] could be transplanted over to production mode relatively straightforwardly

That would IMHO be a huge improvement, and help reduce the "sometimes" large frustration caused by random failures.