deepRacin

Deep Resource-aware OpenCL Inference Networks

Deploy computation graphs (such as trained deep neural network models) to mobile or desktop OpenCL supporting platforms.
Automated, resource-aware graph scheduling and parametrization
Runs on Linux, Windows (and MacOS systems, not tested)
Nvidia, AMD, Intel and mobile GPUs: Mali (OpenCL 1.1 required)

Note: This library is under development and in alpha status. While the library should run on all OS and GPUs that support OpenCL 1.1, it is not tested on most configurations. Also, there is no full documentation yet.

If you have questions, feedback, suggestions or if you want to contribute, feel free to contact me!

Workflow

Define a computation graph in python, test it and store it in the deepRacin format
Load the stored model in a C application
Initialize OpenCL or use an existing context and buffers
In a processing loop:
1. Feed data
2. Run the graph

Basic examples

These are reduced examples to give the intuition of how to use deepRacin. Therefore, configuration and data loading code is omitted. See examples/vgg16 or examples/squeezenet-v1.1 for full examples of networks for classification of the 1000 ILSVRC2012 classes.

For Step 1 in Python:

import deepracin as dr
# Create empty graph
graph = dr.create_graph()

# Fill graph
# Feed node - Will be fed with data for each graph application
feed_node = dr.feed_node(graph, shape=(224, 224, 3))

# Conv2d node, given numpy arrays conv_weights and conv_biases
conv = dr.Conv2d(feed_node, shape, stride, activation='relu', weights=conv_weights, biases=conv_biases)

# MaxPooling node
pool = dr.Pooling(conv, pooling_type='max', shape, stride)

# FullyConnected node, given numpy arrays fc_weights and fc_biases
fc = dr.Fully_Connected(pool, shape, activation='relu', weights=fc_weights, biases=fc_biases)

# Mark output node
dr.mark_as_output(fc)

# Save deepracin graph
dr.save_graph(graph,model_path)

# Graph testing in python:
# Setup and schedule everything
dr.prepare(graph)

for img_data in img_paths:
    # Feed data
    dr.feed_data(feed_node,data)

    # Apply graph - returns one numpy array for each node marked as output
    fc_output = dr.apply(graph)

For Steps 2, 3 and 4 in C with a new OpenCL environment:

// Load Graph
net = dR_NewGraph();
dR_loadGraph(net,model_path,&nodeslist,&numnodes,&feedlist,&numfeeds);

// Mark Output Node
dR_setAsOutput(net,nodeslist[numnodes-1]);

// Initialize OpenCL
dR_initCL(net);

// Setup and schedule everything
dR_prepare(net);

// Get OpenCL buffers for outputs
dR_getOutputBuffers(net,outbuffers);

for(int i = 0; i<numImages;i++)
{
    // Feed data
    dR_feedData(net,feedlist[0],(cl_float*)data[i],0,buffersize*sizeof(cl_float));
    // Apply graph
    dR_apply(net);
    // Get output data
    dR_downloadArray(net,"", outbuffers[0],0,out_size*sizeof(cl_float),data_out);
}

or with an existing OpenCL context and buffers:

// Load Graph
net = dR_NewGraph();
dR_loadGraph(net,model_path,&nodeslist,&numnodes,&feedlist,&numfeeds);

// Use existing OpenCL context
dR_setClEnvironment(net, clContext, clPlatformId, clCommandQueue, clDeviceId);
dR_setDataFeedNodeBuffer(net,feedlist[0],existingCLMemPointer1);
dR_setPreexistingOutputBuffer(net,nodeslist[numnodes-1],existingCLMemPointer2);

// Setup and schedule everything
dR_prepare(net);

for(int i = 0; i<numImages;i++)
{
    ...
    // Apply graph
    dR_apply(net);
    ...
}

Getting Started

Dependencies of the C library:

OpenCL 1.1
Glib 2.0

Dependencies of the Python interface:

Numpy

Misc:

For the C part of the examples, libpng is required to load test images.
For building, CMake 2.8 (3.4 on Windows) is required.

Installation

On Linux:

Install glib > 2.6, OpenCL, libpng and zlib
Checkout deepRacin git repository
Navigate to checkout folder
Create build dir, navigate there
```
mkdir build
cd build
```
Apply cmake. Choose ON or OFF for options (without brackets). Note that Python and Numpy are required for installing the Python interface and libpng is required for building the examples
```
cmake .. -DINSTALL_PYTHON_INTERFACE=<ON|OFF> -DCOMPILE_EXAMPLES=<ON|OFF>
```
Install the library
```
sudo make install
```

On Windows: (Overview, detailed version not available at the moment)

Download and compile glib > 2.6, libpng and zlib with Visual Studio and install OpenCL
Checkout deepRacin git repository
Use CMake to configure
Set all missing paths to OpenCL, glib, zlib and libpng
Adjust Install Prefix
Generate Project
Build INSTALL Target of the generated Visual Studio Project

Currently implemented graph nodes

DataFeedNode
DNN Nodes

Conv2d (direct, winograd(2x2, 3x3) and specialized 1x1 implementations)
Pooling (currently Max, Avg)
FullyConnected
Activation fuctions (currently ReLU, Linear)
Softmax

Math Operations

Add (with tensor or scalar)
Sub (with tensor or scalar)
Mul (with tensor or scalar)
Div (with tensor or scalar)
Pow (with tensor or scalar)
Log
Sqrt
Exp
Fill

Transforms

Concat
Slice

Image

Normalization (per image to given mean and stddev)
CropOrPad
Upscaling
RGBtoGray
MaskDependentFilter (applies one of k image filters to each pixel, depending on integer mask)

All implementations are given as OpenCL host and device code.

Acknowledgement

This work has been supported by Deutsche Forschungsgemeinschaft (DFG) within the Collaborative Research Center SFB 876 “Providing Information by Resource-Constrained Analysis”, project B2.