Simple slurm like Job queue written in python.
Consists of Control Daemon that collects and distributes jobs. Keeps track of workers and job status.
Supports sbatch, squeue, sinfo and (not yet) scancel.
- Clone the repo
cd pyqueue
python -m pip install .
To test if the installation was successful, type pyqueue --version
To use pyqueue, start the queue daemon with pyqueue start daemon
. You can check if the deamon is running with pyqueue sinfo
.
Now the daemon is ready to accept jobs, which you can submit with sbatch
, i.e. pyqueue sbatch Hello World!
and monitor with pyqueue squeue
. All of the jobs that are submitted to the daemon get collected and distributed among the worker processes that the daemon manages. To spin up a worker, you can run pyqueue start worker
. You can check the status of the worker by calling pyqueue sinfo
. The outputs for each job are stored in ./outputs/
.
To keep pyqueue somewhat modular, it is split into:
daemon.py
, a queue serverclient.py
, a clientworker.py
, worker processesjobs.py
, job specifications
Any of the 4 componenents can be extendend and worked on relatively independently of the remaining parts. Other Job types can be added as long as they inherit from Job
.
- Add logging (server / CtlDaemon and workers)
- Add error handling and exceptions for jobs (also Keyboard Interrupt of server / worker)
- Add tests
- tests for all job types
- tests for daemon
- ==NEXT== tests for worker
- tests for client
- Add Type Hinting and Documentation and comments
- Add users
- Add priorities
- Add timing of jobs
- setup.py
- add setup.py
- test setup.py in VM
- License
- System config file? (log_dir, output_dir, port, daemon address)
- Start/stop daemon from client
- make squeue nice
- add flag for state
- add flat to show finished
- add flag to show jobtype
- add flag to show user
- add flag to show me
- add flag to show id
- make sinfo nice
- Only allow one server to run at a time!
- make sbatch work nicely
- [-] make scancel work
- Send and receive jobs as pickle, rather than dict
- Keep jobs in file so when Server is killed, they can potentially be resumed
- Make client functions accept kwargs
- Register the workers automatically (up to max number of workers, as specified in kwargs)
- Remove finished jobs from queue (all or if they are too old)
- Change to different protocol i.e. HTTP? (one that does not need pickling of objects)
- Look into server option
register_instance(instance, allow_dotted_names=False)
that could expose class variables and allow to change them without the dictionary hussle of updating them
- Worker is somewhat of a mess that needs fixing
- Add Worker I/O (output/error logs)
- add process logs with .err and .out
- Add Worker
- Add multiprocessing.Pool worker option for running Callables
- Add CallableJob where job.run is just running a python function
- Add shutdown function at the end (orderly shutdown)
- Add option to rerun failed jobs (x times, at the end, requeue them...)
- DockerJob