A fast BRIEF Implementation for 256-dimensional 64-bit Binary Descriptors
This is a C++ method that allows you to calculate the BRIEF binary feature descriptors.
This method is approx. 8x faster than OpenCV's BRIEF
method. The reason is that I am using a 64-bit strategy
as opposed to the 8-bit strategy of OpenCV. Furthermore, my code is aggressively optimized. That is, the kernel contains
mostly packed avx2 instructions.
- OpenCV: 2.5 ms
- this version: 0.31 ms
32-byte aligned plain arrays
insteadstl vectors
- Gaussian pattern is divided into 4 256x1 contiguous arrays instead of a 256x4 matrices. Allows vectorization.
- Complete unrolling of the
i = 0 ... 256
loop - AVX2 masking (
_mm256_movemask_epi8
) and comparing (_mm256_cmpeq_epi8
) and load-n-store removes need of bitshifting
- Unroll loops by pre-processor
- Try to speed up memory-bound bottleneck by AVX2 gather/scatter instructions
- m4 (macro pre-processor)
- Intel ispc SIMD compiler
Have a look at the CMake file. Just run it with your individual paths.
Feel free to contact me if you have questions or just want to chat about it.