Compute Library for Deep Neural Networks (clDNN) is an open source performance
library for Deep Learning (DL) applications intended for acceleration of
DL Inference on Intel® Processor Graphics – including HD Graphics and
Iris® Graphics.
clDNN includes highly optimized building blocks for implementation of
convolutional neural networks (CNN) with C and C++ interfaces. We created
this project to enable the DL community to innovate on Intel® processors.
Usages supported: Image recognition, image detection, and image segmentation.
Validated Topologies: AlexNet*, VGG(16,19)*, GoogleNet(v1,v2,v3)*, ResNet(50,101,152)* Faster R-CNN*, Squeezenet*, SSD_googlenet*, SSD_VGG*, PVANET*, PVANET_REID*, age_gender*, FCN* and yolo*.
As with any technical preview, APIs may change in future updates.
clDNN is licensed is licensed under Apache License Version 2.0.
clDNN uses 3rd-party components licensed under following licenses:
- boost under Boost* Software License - Version 1.0
- googletest under Google* License
- OpenCL™ ICD and C++ Wrapper under Khronos™ License
- RapidJSON under Tencent* License
The latest clDNN documentation is at GitHub pages.
There is also inline documentation available that can be generated with Doxygen.
Accelerate Deep Learning Inference with Intel® Processor Graphics whitepaper link.
clDNN is released also together with Intel® OpenVino™ Toolkit, which contains:
- Model Optimizer a Python*-based command line tool, which imports trained models from popular deep learning frameworks such as Caffe*, TensorFlow*, and Apache MXNet*.
- Inference Engine an execution engine which uses a common API to deliver inference solutions on the platform of your choice (for example GPU with clDNN library)
You can find more information here.
New features:
- 3 spatial dimensions support in convolution primitive (3D convolution)
- reverse primitive
- arg_max_min support for i8/s8/i32/i64 types
- concatenation support for bfzyx (5D) format
Bug fixes:
- fixes in primitive fusing pass (for i8/s8 types)
- fixes in graph optimizer (reshape primitive)
- overflow/underflow fixes for eltwise (i8/s8)
- fixes for convolution-eltwise primitive
- fixes for convolution primitive (depth-wise case)
- perf fixes for events pool
- fixes for pooling primitive (u8)
- fixes for deconvolution primitive
- fixes for fc primitive
- fixes for batch_norm primitive
UX:
- refactored and cleaned up JIT constants generation mechanism
- refactored kernel selection mechanism
- removed legacy device info mechanism
Performance:
- convolution primitive optimizations (for byxf, for MMAD-based, for byxf fp16, for bfyx fp16)
- fc primitive optimizations (for byxf)
- pooling primitive optimizations (for byxf, bfyx)
- convolution-relu primitive fusing (i8 -> s8 case)
- eltwise primitive optimizations (for byxf)
- fused convolution-eltwise primitive optimizations (IMAD-based)
- block-based optimizations for fp16 primitives
New features:
- added max mode for contract primitive
- added one_hot primitive
- optional explicit output data type support for all primitives
Bug fixes:
- fix for graph optimizer (crop primitive)
- fix for processing order (deconvolution primitive)
- fix for convolution-eltwise primitive
UX:
- cache.json is searched in to library directory
Performance:
- optimizations for lstm_gemm primitive
New features:
- events pool
- group support in convolution and deconvolution primitives
- broadcastable inputs support for eltwise primitive
- asymmetric padding for convolution primitive
- fused convolution-eltwise primitive (API extension)
- auto-calculated output shape support for reshape primitive
- crop support for i8/s8/i32/i64 types
- broadcast axis support for broadcast primitive
- logic and comparison operations support for eltwise primitive
Bug fixes:
- added required alignment checks for some fc implementations
- added lstm support for f16 (half) type
- reorders for fc moved to graph compiler
- primitive fusing and reorder fixes
UX:
- added internal core tests project
- refactored optimizations pass manager and passes
Performance:
- optimized concatenation during upsampling (unpool)
- IMAD-based optimizations for convolution, fc, eltwise and pooling primitives (i8/s8)
- convolution-eltwise fusing optimizations
- partial writes optimizations for block-based kernels
- gtests code refactor
- buildbreak fix
New features:
- pyramidRoiAlign primitive
- multiple axes support for reverse mode in index_select
- eltwise min/max/mod support for i8/i32/i64
- broadcast support for i32/i64
Bug fixes:
- memory leak fixes
- in-place reshape
- no padding for output primitives
UX:
- RapidJSON library for auto-tune cache
- less dependencies in program.cpp
- do not throw error, when device not validated
- global pooling in c API
- optimized padding for convolution
New features:
- throttle hints
- extended border and tile
- GPU implementation of Detection Output
- More cases for BatchNorm primitive
Bug fixes:
- GEMM fix (align with ONNX)
- memory leak fix in memory pool
- increase FC precision for fp16 (fp32 accu)
Performance:
- cache for new topologies and devices
- conv1x1 with stride >1 into eltwise optimization
New features:
- condition primitive
- fused convolution with bn and scale (backprop)
- scale/shit and mean/var as an output in batch norm
- add LSTM output selection
Bug fixes:
- memory pool fixes
UX:
- downgrade to cxx11
- add support for u8 data type in custom primitive
- library size optimizations
Performance:
- in place concatenation optimization
- conv1x1 with stride >1 into eltwise optimization
New features
- local convolution
- eltwise with strie
New features:
- select index primitive
- gemm primitive
Bug fixes:
- fix for output format in fully connected primitive
New features:
- log2 activation function
- support for i32 and i64 types
- select primitive
- border primitive
- tile primitive
Bug fixes:
- dilation > input size fix
New features:
- lstm primitive
- average unpooling primitive
- serialization - dump weights, biases and kernels
- scale grad for input and weights primitive
Bug fixes:
- wrong gws in concatenation
- int8 layers
- convolution depthwise bias concatenation
- params in engine_info
- mutable_data filler
- momentum calculation
UX:
- kernel selector renaming
- bfyx_yxfb batched reorder
- code cleanups
- primitives allocation order
New features:
- support for img_info=4 in proposal_gpu
- support images format in winograd
- support for 2 or more inputs in eltwise
- priority and throttle hints
- deconvolution_grad_input primitive
- fc_grad_input and fc_grad_weights primitives
Bug fixes:
- tensor fixes (i.e. less operator fix)
- cascade concat fixes
- winograd fixes for bfyx format
- auto-tuning fixes for weights calculation
UX:
- memory pool (reusing memory buffers)
- added choosen kernel name in graph dump
- flush memory functionality
Performance:
- graph optimizations
- depth-concatenation with fused relu optimization
- winograd optimizations
- deconvolution optimizations (i.e bfyx opt)
New features:
- fused winograd
- image support for weights
- yolo_region primitive support
- yolo_reorg primitive support
Bug fixes:
- winograd bias fix
- mean subtract fix
UX:
- update boost to 1.64.0
- extend graph dumps
Performance:
- update offline caches for newer drivers
- conv1x1 byxf optimization
- conv1x1 with images
- cascade depth concatenation fuse optimization
New features:
- split primitive
- upsampling primitive
- add preliminary Coffe Lake support
- uint8 weights support
- versioning
- offline autotuner cache
- Winograd phase 1 - not used yet
Bug fixes:
- in-place crop optimization bug fix
- output spatial padding in yxfb kernels fix
- local work sizes fix in softmax
- underflow fix in batch normalization
- average pooling corner case fix
UX:
- graph logger, dumps graphwiz format files
- extended documentation with API diagram and graph compilation steps
Performance:
- softmax optimization
- lrn within channel optimization
- priorbox optimization
- constant propagation
New features:
- OOOQ execution model implementation
- depthwise separable convolution implementation
- kernel auto-tuner implementation
Bug fixes:
- dump hidden layer fix
- run single layer fix
- reshape fix
UX:
- enable RTTI
- better error handling/reporting
Performance:
- lrn optimization
- dynamic pruning for sparse fc layers
- reorder optimization
- concatenation optimization
- eltwise optimization
- activation fusing
Added:
- kernel selector
- custom layer
Changed:
- performance improvments
- bug fixes (deconvolution, softmax, reshape)
- apply fixes from community reported issues
Added:
- step by step tutorial
Changed:
- perfomance optimization for: softmax, fully connected, eltwise, reshape
- bug fixes (conformance)
- initial drop of clDNN
Please report issues and suggestions GitHub issues.
We welcome community contributions to clDNN. If you have an idea how to improve the library:
- Share your proposal via GitHub issues
- Ensure you can build the product and run all the examples with your patch
- In the case of a larger feature, create a test
- Submit a pull request
We will review your contribution and, if any additional fixes or modifications are necessary, may provide feedback to guide you. When accepted, your pull request will be merged into our internal and GitHub repositories.
clDNN supports Intel® HD Graphics and Intel® Iris® Graphics and is optimized for
- Codename Skylake:
- Intel® HD Graphics 510 (GT1, client market)
- Intel® HD Graphics 515 (GT2, client market)
- Intel® HD Graphics 520 (GT2, client market)
- Intel® HD Graphics 530 (GT2, client market)
- Intel® Iris® Graphics 540 (GT3e, client market)
- Intel® Iris® Graphics 550 (GT3e, client market)
- Intel® Iris® Pro Graphics 580 (GT4e, client market)
- Intel® HD Graphics P530 (GT2, server market)
- Intel® Iris® Pro Graphics P555 (GT3e, server market)
- Intel® Iris® Pro Graphics P580 (GT4e, server market)
- Codename Apollolake:
- Intel® HD Graphics 500
- Intel® HD Graphics 505
- Codename Kabylake:
- Intel® HD Graphics 610 (GT1, client market)
- Intel® HD Graphics 615 (GT2, client market)
- Intel® HD Graphics 620 (GT2, client market)
- Intel® HD Graphics 630 (GT2, client market)
- Intel® Iris® Graphics 640 (GT3e, client market)
- Intel® Iris® Graphics 650 (GT3e, client market)
- Intel® HD Graphics P630 (GT2, server market)
- Intel® Iris® Pro Graphics 630 (GT2, server market)
clDNN currently uses OpenCL™ with multiple Intel® OpenCL™ extensions and requires Intel® Graphics Driver to run.
clDNN requires CPU with Intel® SSE/Intel® AVX support.
The software dependencies are:
- CMake* 3.9 or later
(the project is compatible with CMake 3.1, but, due to issues with boost libraries resolution in CMake 3.4.3, with CheckCXXCompilerFlag module in CMake 3.5.2 and hard dependency on supported boost version based on version of CMake, we strongly recommend 3.9+)NOTE: In rare situation when update of CMake is not possible, you can try to update / override only FindBoost.cmake module. You can do that by downloading FindBoost.cmake file from newer version of CMake (e.g. from here) and putting the file into common/boost/cmake/modules directory (create it if necessary). This directory will be attached to the list of modules if your CMake version is lower than 3.9.
- C++ compiler with partial or full C++11 standard support compatible with:
- GNU* Compiler Collection 4.8.2
- clang 3.5 or later
- Intel® C++ Compiler 17.0 or later
- Visual C++ 2015 (MSVC++ 19.0) or later
Intel® CPU intrinsics header (
<immintrin.h>
) must be available during compilation.
- python™ 2.7 or later (scripts are both compatible with python™ 2.7.x and python™ 3.x)
- (optional) Doxygen* 1.8.13 or later
Needed for manual generation of documentation from inline comments or runningdocs
custom target which will generate it automatically.
GraphViz* (2.38 or later) is also recommended to generate documentation with all embedded diagrams.
(Make sure thatdot
application is visible in thePATH
environment variable.)
-
The software was validated on:
- CentOS* 7.2 with GNU* Compiler Collection 5.2 (64-bit only), using Intel® Graphics Compute Runtime for OpenCL(TM) .
- Windows® 10 and Windows® Server 2012 R2 with MSVC 14.0, using Intel® Graphics Driver for Windows* [24.20] driver package.
More information on Intel® OpenCL™ drivers can be found here.
We recommend to use latest for Linux link and 24.20 driver for Windows link.
Download clDNN source code or clone the repository to your system:
git clone https://github.com/intel/cldnn.git
Satisfy all software dependencies and ensure that the versions are correct before building.
clDNN uses multiple 3rd-party components. They are stored in binary form in common
subdirectory. Currently they are prepared for MSVC++ and GCC*. They will be cloned with repository.
clDNN uses a CMake-based build system. You can use CMake command-line tool or CMake GUI (cmake-gui
) to generate required solution.
For Windows system, you can call in cmd
(or powershell
):
@REM Generate 32-bit solution (solution contains multiple build configurations)...
cmake -E make_directory build && cd build && cmake -G "Visual Studio 14 2015" ..
@REM Generate 64-bit solution (solution contains multiple build configurations)...
cmake -E make_directory build && cd build && cmake -G "Visual Studio 14 2015 Win64" ..
Created solution can be opened in Visual Studio 2015 or built using appropriate msbuild
tool
(you can also use cmake --build .
to select build tool automatically).
For Unix and Linux systems:
@REM Create GNU makefile for release clDNN and build it...
cmake -E make_directory build && cd build && cmake -DCMAKE_BUILD_TYPE=Release .. && make
@REM Create Ninja makefile for debug clDNN and build it...
cmake -E make_directory build && cd build && cmake -G Ninja -DCMAKE_BUILD_TYPE=Debug .. && ninja -k 20
You can call also scripts in main directory of project which will create solutions/makefiles for clDNN (they
will generate solutions/makefiles in build
subdirectory and binary outputs will be written to build/out
subdirectory):
create_msvc_mscc.bat
(Windows*, Visual Studio* 2015)create_unixmake_gcc.sh [Y|N] [<devtoolset-version>]
(Linux*, GNU* or Ninja* makefiles, optional devtoolset support)- If you specify the first parameter as
Y
, the Ninja makefiles will be generated. - If you specify second parameter (number), the CMake will be called via
scl
with selecteddevtoolset
version.
- If you specify the first parameter as
CMake solution offers multiple options which you can specify using normal CMake syntax (-D<option-name>=<value>
):
CMake option | Type | Description |
---|---|---|
CMAKE_BUILD_TYPE | STRING | Build configuration that will be used by generated makefiles (it does not affect multi-configuration generators like generators for Visual Studio solutions). Currently supported: Debug (default), Release |
CMAKE_INSTALL_PREFIX | PATH | Install directory prefix. |
CLDNN__ARCHITECTURE_TARGET | STRING | Architecture of target system (where binary output will be deployed). CMake will try to detect it automatically (based on selected generator type, host OS and compiler properties). Specify this option only if CMake has problem with detection. Currently supported: Windows32 , Windows64 , Linux64 |
CLDNN__OUTPUT_DIR (CLDNN__OUTPUT_BIN_DIR, CLDNN__OUTPUT_LIB_DIR) | PATH | Location where built artifacts will be written to. It is set automatically to roughly build/out/<arch-target>/<build-type> subdirectory. For more control use: CLDNN__OUTPUT_LIB_DIR (specifies output path for static libraries) or CLDNN__OUTPUT_BIN_DIR (for shared libs and executables). |
CMake advanced option | Type | Description |
PYTHON_EXECUTABLE | FILEPATH | Path to Python interpreter. CMake will try to detect Python. Specify this option only if CMake has problem with locating Python. |
CLDNN__BOOST_VERSION | STRING | Version of boost prebuilded binaries to use (from common subdirectory). It is automatically setected by CMake (highest version). Specify, if you have multiple versions and want to use different than automatically selected. |
CLDNN__IOCL_ICD_USE_EXTERNAL | BOOL | Use this option to enable use of external Intel® OpenCL™ SDK as a source for ICD binaries and headers (based on INTELOCLSDKROOT environment variable). Default: OFF |
CLDNN__IOCL_ICD_VERSION | STRING | Version of Intel® OpenCL™ ICD binaries and headers to use (from common subdirectory). It is automatically setected by CMake (highest version). Specify, if you have multiple versions and want to use different than automatically selected. |
CLDNN__COMPILE_LINK_ALLOW_UNSAFE_SIZE_OPT | BOOL | Allow unsafe optimizations during linking (like aggressive dead code elimination, etc.). Default: ON |
CLDNN__COMPILE_LINK_USE_STATIC_RUNTIME | BOOL | Link with static C++ runtime. Default: OFF (shared C++ runtime is used) |
CLDNN__INCLUDE_CORE | BOOL | Include core clDNN library project in generated makefiles/solutions. Default: ON |
CLDNN__INCLUDE_TESTS | BOOL | Include tests application project (based on googletest framework) in generated makefiles/solutions . Default: ON |
CLDNN__RUN_TESTS | BOOL | Run tests after building tests project. This option requires CLDNN__INCLUDE_TESTS option to be ON . Default: OFF |
CLDNN__CMAKE_DEBUG | BOOL | Enable extended debug messages in CMake. Default: OFF |
clDNN includes unit tests implemented using the googletest framework. To validate your build, run tests
target, e.g.:
make tests
(Make sure that both CLDNN__INCLUDE_TESTS
and CLDNN__RUN_TESTS
were set to ON
when invoking CMake.)
Documentation is provided inline and can be generated in HTML format with Doxygen. We recommend to use latest Doxygen* and GraphViz*.
Documentation templates and configuration files are stored in docs
subdirectory. You can simply call:
cd docs && doxygen
to generate HTML documentation in docs/html
subdirectory.
There is also custom CMake target named docs
which will generate documentation in CLDNN__OUTPUT_BIN_DIR/html
directory. For example, when using Unix makefiles, you can run:
make docs
in order to create it.
Special install
target will place the API header files and libraries in /usr/local
(C:/Program Files/clDNN
or C:/Program Files (x86)/clDNN
on Windows). To change
the installation path, use the option -DCMAKE_INSTALL_PREFIX=<prefix>
when invoking CMake.
* Other names and brands may be claimed as the property of others.
Copyright © 2017, Intel® Corporation