/deep-fusion

a fused tensor inference optimization primitives lib on Intel Xeon E5 platforms

Primary LanguageC++Apache License 2.0Apache-2.0

Deep Fusion

a deep-fused inference optimization primitives lib on Intel Xeon E5 platforms, bulks of codes are borrowed from MKL-DNN.

Build & Use

How to Build

$ mkdir -p build && cd build
$ cmake ..
$ make -j `nproc`
$ make test
$ make install

It will download, compile and install all dependencies automatically, including:

How to Benchmark

Add "-DWITH_BENCHMARK=ON" in cmake comamnd. Once build done, you can run with:

$ bash ./build/benchmark/bench_concat

How to Profile

Add "-DWITH_VERBOSE=ON" in cmake comamnd, and export below env variable:

$ export DEEPFUSION_VERBOSE=1

The WITH_VERBOSE option is enabled in Debug and disabled in Release by default.

How to Dump Code

Add "-DWITH_DUMP_CODE=ON" in cmake comamnd, and export below env variable:

$ export DEEPFUSION_DUMP_CODE=1

The WITH_DUMP_CODE option is enabled in Debug and disabled in Release by default.

Then, when run some apps, you can get some file like jit_dump_jit_concat_kernel.0.bin. You can use xed to check the ASM. For exapmle:

$ xed -ir jit_dump_jit_concat_kernel.0.bin
XDIS 0: PUSH      BASE       53                       push ebx
XDIS 1: PUSH      BASE       55                       push ebp
XDIS 2: BINARY    BASE       41                       inc ecx
XDIS 3: PUSH      BASE       54                       push esp
XDIS 4: BINARY    BASE       41                       inc ecx
XDIS 5: PUSH      BASE       55                       push ebp
XDIS 6: BINARY    BASE       41                       inc ecx
XDIS 7: PUSH      BASE       56                       push esi
XDIS 8: BINARY    BASE       41                       inc ecx
XDIS 9: PUSH      BASE       57                       push edi

Generate MinSizeRel

This will only generate deepfusion library without any benchmark utilities and gtests.

cmake .. -DCMAKE_BUILD_TYPE=MinSizeRel

Operators Support LIST

  • concat+relu fused op (AVX/AVX2/AVX512)
  • conv3x3+relu+conv1x1+relu fused op (AVX512)
  • conv+relu+pooling fused op
  • eltwise-sum + relu fused op

Supported Data Types

op data_in weight bias scale data_out
concat+relu u8/s8/s32/f32 N/A N/A N/A u8/s8/s32/f32
conv3x3+relu+conv1x1+relu u8 s8 u8/s8/s32/f32 f32 u8/s8/s32/f32