/deepracin

Deep Resource-Aware OpenCL Inference Networks

Primary LanguageCApache License 2.0Apache-2.0

deepRacin

Deep Resource-aware OpenCL Inference Networks

  • Deploy computation graphs (such as trained deep neural network models) to mobile or desktop OpenCL supporting platforms.
  • Automated, resource-aware graph scheduling and parametrization
  • Runs on Linux, Windows (and MacOS systems, not tested)
  • Nvidia, AMD, Intel and mobile GPUs: Mali (OpenCL 1.1 required)
Note: This library is under development and in alpha status. While the library should run on all OS and GPUs that support OpenCL 1.1, it is not tested on most configurations. Also, there is no full documentation yet.

If you have questions, feedback, suggestions or if you want to contribute, feel free to contact me!

Workflow

  1. Define a computation graph in python, test it and store it in the deepRacin format
  2. Load the stored model in a C application
  3. Initialize OpenCL or use an existing context and buffers
  4. In a processing loop:
    1. Feed data
    2. Run the graph

Basic examples

These are reduced examples to give the intuition of how to use deepRacin. Therefore, configuration and data loading code is omitted. See examples/vgg16 or examples/squeezenet-v1.1 for full examples of networks for classification of the 1000 ILSVRC2012 classes.

For Step 1 in Python:

import deepracin as dr
# Create empty graph
graph = dr.create_graph()

# Fill graph
# Feed node - Will be fed with data for each graph application
feed_node = dr.feed_node(graph, shape=(224, 224, 3))

# Conv2d node, given numpy arrays conv_weights and conv_biases
conv = dr.Conv2d(feed_node, shape, stride, activation='relu', weights=conv_weights, biases=conv_biases)

# MaxPooling node
pool = dr.Pooling(conv, pooling_type='max', shape, stride)

# FullyConnected node, given numpy arrays fc_weights and fc_biases
fc = dr.Fully_Connected(pool, shape, activation='relu', weights=fc_weights, biases=fc_biases)

# Mark output node
dr.mark_as_output(fc)

# Save deepracin graph
dr.save_graph(graph,model_path)

# Graph testing in python:
# Setup and schedule everything
dr.prepare(graph)

for img_data in img_paths:
    # Feed data
    dr.feed_data(feed_node,data)

    # Apply graph - returns one numpy array for each node marked as output
    fc_output = dr.apply(graph)

For Steps 2, 3 and 4 in C with a new OpenCL environment:

// Load Graph
net = dR_NewGraph();
dR_loadGraph(net,model_path,&nodeslist,&numnodes,&feedlist,&numfeeds);

// Mark Output Node
dR_setAsOutput(net,nodeslist[numnodes-1]);

// Initialize OpenCL
dR_initCL(net);

// Setup and schedule everything
dR_prepare(net);

// Get OpenCL buffers for outputs
dR_getOutputBuffers(net,outbuffers);

for(int i = 0; i<numImages;i++)
{
    // Feed data
    dR_feedData(net,feedlist[0],(cl_float*)data[i],0,buffersize*sizeof(cl_float));
    // Apply graph
    dR_apply(net);
    // Get output data
    dR_downloadArray(net,"", outbuffers[0],0,out_size*sizeof(cl_float),data_out);
}

or with an existing OpenCL context and buffers:

// Load Graph
net = dR_NewGraph();
dR_loadGraph(net,model_path,&nodeslist,&numnodes,&feedlist,&numfeeds);

// Use existing OpenCL context
dR_setClEnvironment(net, clContext, clPlatformId, clCommandQueue, clDeviceId);
dR_setDataFeedNodeBuffer(net,feedlist[0],existingCLMemPointer1);
dR_setPreexistingOutputBuffer(net,nodeslist[numnodes-1],existingCLMemPointer2);

// Setup and schedule everything
dR_prepare(net);

for(int i = 0; i<numImages;i++)
{
    ...
    // Apply graph
    dR_apply(net);
    ...
}

Getting Started

Dependencies of the C library:

  • OpenCL 1.1
  • Glib 2.0

Dependencies of the Python interface:

  • Numpy

Misc:

  • For the C part of the examples, libpng is required to load test images.
  • For building, CMake 2.8 (3.4 on Windows) is required.

Installation

On Linux:
  1. Install glib > 2.6, OpenCL, libpng and zlib
  2. Checkout deepRacin git repository
  3. Navigate to checkout folder
  4. Create build dir, navigate there
    mkdir build
    cd build
  5. Apply cmake. Choose ON or OFF for options (without brackets). Note that Python and Numpy are required for installing the Python interface and libpng is required for building the examples
    cmake .. -DINSTALL_PYTHON_INTERFACE=<ON|OFF> -DCOMPILE_EXAMPLES=<ON|OFF>
  6. Install the library
    sudo make install

On Windows: (Overview, detailed version not available at the moment)

  1. Download and compile glib > 2.6, libpng and zlib with Visual Studio and install OpenCL
  2. Checkout deepRacin git repository
  3. Use CMake to configure
  4. Set all missing paths to OpenCL, glib, zlib and libpng
  5. Adjust Install Prefix
  6. Generate Project
  7. Build INSTALL Target of the generated Visual Studio Project

Currently implemented graph nodes

  • DataFeedNode
  • DNN Nodes
    • Conv2d (direct, winograd(2x2, 3x3) and specialized 1x1 implementations)
    • Pooling (currently Max, Avg)
    • FullyConnected
    • Activation fuctions (currently ReLU, Linear)
    • Softmax
  • Math Operations
    • Add (with tensor or scalar)
    • Sub (with tensor or scalar)
    • Mul (with tensor or scalar)
    • Div (with tensor or scalar)
    • Pow (with tensor or scalar)
    • Log
    • Sqrt
    • Exp
    • Fill
  • Transforms
    • Concat
    • Slice
  • Image
    • Normalization (per image to given mean and stddev)
    • CropOrPad
    • Upscaling
    • RGBtoGray
    • MaskDependentFilter (applies one of k image filters to each pixel, depending on integer mask)
All implementations are given as OpenCL host and device code.

Acknowledgement

This work has been supported by Deutsche Forschungsgemeinschaft (DFG) within the Collaborative Research Center SFB 876 “Providing Information by Resource-Constrained Analysis”, project B2.