a deep-fused inference optimization primitives lib on Intel Xeon E5 platforms, bulks of codes are borrowed from MKL-DNN.
$ mkdir -p build && cd build
$ cmake ..
$ make -j `nproc`
$ make test
$ make install
It will download, compile and install all dependencies automatically, including:
- Xbyak, used for JIT kernels.
- Intel(R) MKL-DNN, used for benchmark comparion and gtest reference.
- Intel(R) MKLML for Intel OpenMP library.
- gtest
- gflags
Add "-DWITH_BENCHMARK=ON" in cmake comamnd. Once build done, you can run with:
$ bash ./build/benchmark/bench_concat
Add "-DWITH_VERBOSE=ON" in cmake comamnd, and export below env variable:
$ export DEEPFUSION_VERBOSE=1
The WITH_VERBOSE option is enabled in Debug and disabled in Release by default.
Add "-DWITH_DUMP_CODE=ON" in cmake comamnd, and export below env variable:
$ export DEEPFUSION_DUMP_CODE=1
The WITH_DUMP_CODE option is enabled in Debug and disabled in Release by default.
Then, when run some apps, you can get some file like jit_dump_jit_concat_kernel.0.bin
. You can use xed to check the ASM. For exapmle:
$ xed -ir jit_dump_jit_concat_kernel.0.bin
XDIS 0: PUSH BASE 53 push ebx
XDIS 1: PUSH BASE 55 push ebp
XDIS 2: BINARY BASE 41 inc ecx
XDIS 3: PUSH BASE 54 push esp
XDIS 4: BINARY BASE 41 inc ecx
XDIS 5: PUSH BASE 55 push ebp
XDIS 6: BINARY BASE 41 inc ecx
XDIS 7: PUSH BASE 56 push esi
XDIS 8: BINARY BASE 41 inc ecx
XDIS 9: PUSH BASE 57 push edi
This will only generate deepfusion library without any benchmark utilities and gtests.
cmake .. -DCMAKE_BUILD_TYPE=MinSizeRel
- concat+relu fused op (AVX/AVX2/AVX512)
- conv3x3+relu+conv1x1+relu fused op (AVX512)
- conv+relu+pooling fused op
- eltwise-sum + relu fused op
op | data_in | weight | bias | scale | data_out |
---|---|---|---|---|---|
concat+relu | u8/s8/s32/f32 | N/A | N/A | N/A | u8/s8/s32/f32 |
conv3x3+relu+conv1x1+relu | u8 | s8 | u8/s8/s32/f32 | f32 | u8/s8/s32/f32 |