VOLK (libvolk) optimization
jj1bdx opened this issue · 11 comments
More code optimization to fully utilize VOLK (libvolk), not only for ARM NEON, but also for x86.
See #10
VOLK (libvolk) has the following issues:
- macOS SDK glitch when building with HomeBrew gcc 9 (9.2)
volk_profile
crashing issue (a workaround)
@bstalk Let us know if you have any tips on using libvolk more effectively for airspy-fmradion. Thx in advance!
libvolk does not have a test for volk_32f_s32f_32f_fm_detect_32f
yet. volk_profile
does not generate the output for this function.
maybe, need to add following line into lib/kernel_test.h (is it bug?)
QA(VOLK_INIT_TEST(volk_32f_s32f_32f_fm_detect_32f, test_params))
then, cmake ..
make
make test
sudo make install
following a part of volk_profile after patching.
RUN_VOLK_TESTS: volk_32f_s32f_32f_fm_detect_32f(131071,1987)
a_avx completed in 104.64 ms
a_sse completed in 120.689 ms
generic completed in 396.686 ms
u_avx completed in 107.013 ms
Best aligned arch: a_avx
Best unaligned arch: u_avx
@bstalk Thx for the testing result of volk_32f_s32f_32f_fm_detect_32f.
Maybe we need to test the performance increase by adding volk_32f_s32f_32f_fm_detect_32f a_avx u_avx
to ~/.volk/volk_config
for the x86_64 platforms.
@bstalk
I've found the entry in volk_config for volk_32f_s32f_32f_fm_detect_32f, so I guess the function is already activated (with AVX for x86).
I've also noticed the following line inlib/kernel_tests.h
, I don't know what this really does:
QA(VOLK_INIT_PUPP(volk_32f_x2_fm_detectpuppet_32f, volk_32f_s32f_32f_fm_detect_32f, test_params))
As I read source:
test_script -> volk_32f_x2_fm_detectpuppet_32f.sh ... puppet func
volk_config -> volk_32f_s32f_32f_fm_detect_32f ... master_func name of puppet.
Sorry, I don't know further info now.
Tips for optimizing for libvolk:
- libvolk don't really have 64-bit functions, so focus first on the 32f and 32fc functions.
- Profile each function first by running
volk_profile -b
, and put lower priority for the functions which have little speed difference between the generic and optimized (AVX, SSE, NEON) implementations. - Use volk::vector for the std::vector as a private member of a class referred from libvolk. Note well, however, that volk::vector does not have the move constructor, so it won't work for *Source drivers.
- *Always check the integrity when the source and destination of a function operation are to the same memory address (the same vector) at least by referring to the generic implementation.
- If a set of operation requires the temporary storage of std::vector or volk::vector, the speed gain will be limited.
Note: I've also reviewed libsoxr, and found that libsoxr uses 32- or 64-bit optimizing instruction as default at least for macOS, so I guess no further review of the optimization is required for libsoxr.
VOLK works OK with Xcode 11.3 CLT running on macOS.