Tighter integration with batch schedulers
Closed this issue · 0 comments
avdgrinten commented
Right now, we can use Slurm to launch experiments but we cannot monitor and/or kill experiments through Slurm.
- In
simex e
, query the batch scheduler to determine if job are still alive or not. - Add a command to kill currently running jobs.
As an implementation strategy, we could store the job IDs of experiments in some file and use that to invoke squeue
and scancel
.