Deep Fusion

a deep-fused inference optimization primitives lib on Intel Xeon E5 platforms, bulks of codes are borrowed from MKL-DNN.

Build & Use

How to Build

$ mkdir -p build && cd build
$ cmake ..
$ make -j `nproc`
$ make test
$ make install

It will download, compile and install all dependencies automatically, including:

Xbyak, used for JIT kernels.
Intel(R) MKL-DNN, used for benchmark comparion and gtest reference.
Intel(R) MKLML for Intel OpenMP library.
gtest
gflags

How to Benchmark

Add "-DWITH_BENCHMARK=ON" in cmake comamnd. Once build done, you can run with:

$ bash ./build/benchmark/bench_concat

How to Profile

Add "-DWITH_VERBOSE=ON" in cmake comamnd, and export below env variable:

$ export DEEPFUSION_VERBOSE=1

The WITH_VERBOSE option is enabled in Debug and disabled in Release by default.

How to Dump Code

Add "-DWITH_DUMP_CODE=ON" in cmake comamnd, and export below env variable:

$ export DEEPFUSION_DUMP_CODE=1

The WITH_DUMP_CODE option is enabled in Debug and disabled in Release by default.

Then, when run some apps, you can get some file like jit_dump_jit_concat_kernel.0.bin. You can use xed to check the ASM. For exapmle:

$ xed -ir jit_dump_jit_concat_kernel.0.bin
XDIS 0: PUSH      BASE       53                       push ebx
XDIS 1: PUSH      BASE       55                       push ebp
XDIS 2: BINARY    BASE       41                       inc ecx
XDIS 3: PUSH      BASE       54                       push esp
XDIS 4: BINARY    BASE       41                       inc ecx
XDIS 5: PUSH      BASE       55                       push ebp
XDIS 6: BINARY    BASE       41                       inc ecx
XDIS 7: PUSH      BASE       56                       push esi
XDIS 8: BINARY    BASE       41                       inc ecx
XDIS 9: PUSH      BASE       57                       push edi

Generate MinSizeRel

This will only generate deepfusion library without any benchmark utilities and gtests.

cmake .. -DCMAKE_BUILD_TYPE=MinSizeRel

Operators Support LIST

concat+relu fused op (AVX/AVX2/AVX512)
conv3x3+relu+conv1x1+relu fused op (AVX512)
conv+relu+pooling fused op
eltwise-sum + relu fused op

Supported Data Types

op	data_in	weight	bias	scale	data_out
concat+relu	u8/s8/s32/f32	N/A	N/A	N/A	u8/s8/s32/f32
conv3x3+relu+conv1x1+relu	u8	s8	u8/s8/s32/f32	f32	u8/s8/s32/f32

zhouhuan2005/deep-fusion