/pyqueue

simple, lightweight job queue. slurm like. extensible for multiple nodes.

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

License: GPL v3 pyqueue version Tests

pyqueue

Simple slurm like Job queue written in python.

Consists of Control Daemon that collects and distributes jobs. Keeps track of workers and job status.

Supports sbatch, squeue, sinfo and (not yet) scancel.

Install

  1. Clone the repo
  2. cd pyqueue
  3. python -m pip install .

To test if the installation was successful, type pyqueue --version

Getting started

To use pyqueue, start the queue daemon with pyqueue start daemon. You can check if the deamon is running with pyqueue sinfo.

Now the daemon is ready to accept jobs, which you can submit with sbatch, i.e. pyqueue sbatch Hello World! and monitor with pyqueue squeue. All of the jobs that are submitted to the daemon get collected and distributed among the worker processes that the daemon manages. To spin up a worker, you can run pyqueue start worker. You can check the status of the worker by calling pyqueue sinfo. The outputs for each job are stored in ./outputs/.

Structure

To keep pyqueue somewhat modular, it is split into:

  • daemon.py, a queue server
  • client.py, a client
  • worker.py, worker processes
  • jobs.py, job specifications

Any of the 4 componenents can be extendend and worked on relatively independently of the remaining parts. Other Job types can be added as long as they inherit from Job.



To Dos:

General

Important

  • Add logging (server / CtlDaemon and workers)
  • Add error handling and exceptions for jobs (also Keyboard Interrupt of server / worker)

Nice to have

  • Add tests
    • tests for all job types
    • tests for daemon
    • ==NEXT== tests for worker
    • tests for client
  • Add Type Hinting and Documentation and comments
  • Add users
  • Add priorities
  • Add timing of jobs
  • setup.py
    • add setup.py
    • test setup.py in VM
  • License
  • System config file? (log_dir, output_dir, port, daemon address)

Client

Important

  • Start/stop daemon from client

Nice to have

  • make squeue nice
    • add flag for state
    • add flat to show finished
    • add flag to show jobtype
    • add flag to show user
    • add flag to show me
    • add flag to show id
  • make sinfo nice

Server

Important

  • Only allow one server to run at a time!
  • make sbatch work nicely
  • [-] make scancel work
  • Send and receive jobs as pickle, rather than dict

Nice to have

  • Keep jobs in file so when Server is killed, they can potentially be resumed
  • Make client functions accept kwargs
  • Register the workers automatically (up to max number of workers, as specified in kwargs)
  • Remove finished jobs from queue (all or if they are too old)
  • Change to different protocol i.e. HTTP? (one that does not need pickling of objects)
  • Look into server option register_instance(instance, allow_dotted_names=False) that could expose class variables and allow to change them without the dictionary hussle of updating them

Worker

Important

  • Worker is somewhat of a mess that needs fixing
  • Add Worker I/O (output/error logs)
  • add process logs with .err and .out
  • Add Worker

Nice to have

  • Add multiprocessing.Pool worker option for running Callables
  • Add CallableJob where job.run is just running a python function
  • Add shutdown function at the end (orderly shutdown)
  • Add option to rerun failed jobs (x times, at the end, requeue them...)

Jobs

Important

Nice to have

  • DockerJob