SMART-Lab/smartdispatch

a command for terminating a whole job batch

Opened this issue · 1 comments

Now we have qdel <job ID> which enables us to terminate a specified job, and qdel all to kill all jobs.

But when we have several job batchs running, each of which has a big number of jobs (say 100 for each job batch), it becomes inconvenient to kill all jobs in a specified job batch.

What we can do is to write a script somewhere in the repo that extracts all job IDs belonging to a specified job batch, and feed those job IDs into qdel. And when we use it, we just need to type something like

     smart_dispatch --killjobbatch <job batch id>

Thank you for letting us know about useful features smart-dispatch should have.
Right now, the job IDs are stored in SMART_DISPATCH_LOGS/{job_name}/job_ids.txt, so it should not be to hard to write a script that parses the file and call qdel on each.