DLPrimitives

This project aims to provide cross platform OpenCL tools for deep learning and inference.

Today, most of deep learning training is done on NVidia GPUs using closed source CUDA and CUDNN libraries. It is either challenging or virtually impossible to use AMD or Intel GPUs. For example: AMD provides ROCm platform, but there is no support of RDNA platforms yet (more than a year since a release), there is no support of APUs and no support of any operating systems other than Linux.

Goals

  • Create an open source, cross platform deep learning primitives library similar to cuDNN or MIOpen that supports multiple GPU architectures.
  • Create an inference library with minimal dependencies for efficient inference on any modern GPU, similar to TensorRT or MIGraphX.
  • Create minimalistic deep-learning framework as POC of capabilities and performance.
  • Long Shot: Integrate to existing large scale deep learing projects like PyTorch, TF, MXNet such that vendor independent open-source OpenCL API will be first class citizen for deep learning.

Please note this is only work in progress - first and preliminary stages.

Documentation

Is published under http://dlprimitives.org/docs/

Features Matrix

Operator Features Computation
Softmax Fwd
SoftmaxWithLoss Fwd,Bwd
Elementwise ax+by, max(ax,by), ax*y Fwd,Bwd
Concat Fwd,Bwd
Slice Fwd,Bwd
MaxPool2d Fwd,Bwd
AvgPool2d Fwd,Bwd
GlobalMaxPool2d Fwd,Bwd
GlobalAvgPool2d Fwd,Bwd
Inner Product Fwd,Bwd
BatchNorm2D Fwd,Bwd
Conv2d GEMM, Winograd, Depthwise Separable Fwd,Bwd
TransposedConv2d GEMM, Winograd, Depthwise Separable Fwd,Bwd
Activation relu, sigmoid, tanh, relu6 Fwd,Bwd

Solvers: SGD, Adam

Validated Networks

Network Source of model Operation
AlexNet torchvision.models Inference
VGG16 torchvision.models Inference
ResNet50 torchvision.models Inference
ResNet18 torchvision.models Inference
MobileNet v2 torchvision.models Inference

The networks were exported from pytorch to ONNX and imported for DLPrimitives. Results compared with sample images. Note currently only inference validated, backpropogation is convered by per-layer regression.

Tested GPUs

Device Vendor Notes
RX 6600XT AMD ROCr
RX 560 AMD 16cu model, ROCm, PAL, Clover
HD 530 Intel i5-6600, NEO driver
GTX 960 NVidia
GTX 1080 NVidia
RTX 2060S NVidia
MaliG52 MC2 ARM performance not optimised yet

Devices Tested on Windows: AMD RX 560, NVidia GTX 960.

Other features

  • Network object for inference
  • ONNX to DlPrimitives model converter