This project provides the code for CSAOpt worker nodes. This is deployed on target instances via Docker images. Workers pull work from the broker, compile models, run optimizations on GPU hardware and and post results back to the broker. Initially written in C++11 (for CUDA) this project now uses numba, which provides JIT compilation of a subset of python (called nopython) into CUDA kernels directly via PTX assembly.
Machine hosting the worker containers need to run docker (duh!) and are subject to following requirements (mainly dictated by nvidia-docker2):
- GNU/Linux x86_64 with kernel version > 3.10
- Docker >= 1.12
- NVIDIA GPU with Architecture > Fermi (2.1)
- NVIDIA drivers ~= 361.93 (untested on older versions)
However, if you are looking at this repository, chances are you might be interested
in debugging worker code or your own models. If that is the case, you will be glad
to hear that the debug-enabled version (via the usual optimization config, refer to
the CSAOpt documentation) will run without nvidia-docker
, as the worker will use
Numba's GPU simulator, running the worker code solely on CPU (and thusly, much much
slower).
The models (i.e. model specific implementation of the Simulated Annealing routines) are serialized on the application master and transmitted to the workers. Workers use a rudimentary file template to generate the actual optimization code at runtime. This code is then loaded and JIT-ed by numba, which then runs the optimization kernels on the GPU hardware.
The workers run on hardware that has Nvidia GPUs available and uses CUDA to run the parallelized optimization algorithm. To minimize the platform dependency, the host machines need not provide any CUDA specific packages, since the worker Docker image already provides all CUDA requirements. Only the GPU driver is required on the host.
This is not intended to be built by hand as it will be deployed
automatically to the target machines using docker. That being said, manual
building should still be relatively straightforward. Dependencies are managed by
conda
and are listed in the environment.yml
. To create an environemnt with
all required dependencies, from the project root execute:
# Create environement and resolve/install all dependencies listed in environment.yml
conda env create
# after downloading all deps (might take a while)
source activate csaopt-worker
# Then, in the activated environment
dramatiq --processes 1 --threads 3 broker:broker worker.tasks.actors
This will start the broker client that accepts model deployments and optimization jobs.
Note that running this without docker
and nvidia-docker
requires an installed CUDA
toolkit in addition to the required Nvidia graphics drivers.
Kernels can even be executed manually, even without GPU hardware, using the debugging
infrastructure of Numba
. This works by setting the NUMBA_ENABLE_CUDASIM
env
variable to 1. This switches numba to the CUDA simulator mode, which is run on
the CPU and can be inspected for debugging. Refer to
the documentation for further details.
You can also have a look at the test suite.