restart-in-place: detect halt file from library to know when to stop restarting
adammoody opened this issue · 1 comments
Without knowing otherwise, the scripts will assume the job must always be restarted, including the case that the job actually ran to completion. To avoid having the scripts auto-restart the job, they need to know that the job ended on purpose.
Note that it's not sufficient to use the exit code of the launch command because some jobs return a non-zero exit code to indicate various info -- e.g., maybe the calculation went bad.
With SCR, we ended up writing a "halt" file in SCR_Finalize, and then we look for that "halt" file in the scripts. If we see it, we assume the job completed and we won't try to restart it. If there is no file, the scripts will try to restart the job.
This issue stayed inactive for a long time. Please reopen if still relevant.