/plz

Helpers for running native Python functions as qsub jobs on the CLSP grid

Primary LanguagePythonApache License 2.0Apache-2.0

plz

Helpers for running native Python functions as qsub jobs on the CLSP grid.

plz uses Dask and Dask-Jobqueue to distribute Python functions as SGE-managed jobs. No need to wrap python scripts into bash scripts anymore!

Installation

pip install git+https://github.com/pzelasko/plz

Examples

CPU jobs:

def task(x):
    return x ** 2

import plz
plz.map(task, range(1000), jobs=10)

GPU jobs:

def task(x):
    import torch
    return torch.tensor([x] * 100, device='cuda').sum()

import plz
plz.map(task, range(1000), jobs=1, gpus=1)

Iterate over multiple sequences:

def task(x, y):
    return (x + y) ** 2
    
import plz
plz.map(task, range(1000), range(1000, 2000), jobs=10)

Using logs:

def task(x):
    import logging
    logging.info(f'Running job with input {x}')
    return x ** 2

import plz
plz.map(task, range(1000), jobs=10, log_dir='/path/to/logs')

Single task:

def task(x):
    return x ** 2

import plz
plz.run(task, 1)

Running on CLSP or COE grid

By default, the map and run methods are configured to run on the CLSP grid. To run on the COE grid, additionally pass the argument grid='coe' to these methods.

Technical details

Under the hood, for each run or map call, it creates your own "mini-cluster" of Python worker "services" where the jobs are being distributed. This cluster has its own scheduling, load balancing etc. It automatically shuts down as soon as all the inputs are processed.