Optimize HrtfSphere::sample_bilinear
OlegOAndreev opened this issue · 2 comments
When using current HRTF renderer with IRCAM spheres in a system with optimized RustFFT implementation (right now an AVX/SSE implementation and in the nearest future Neon implementation ejmahler/RustFFT#78) most of the time in HrtfProcessor::process_samples
is spent in hrtf_sphere.sample_bilinear
(see attached profile: 56% percent is spent in sample_bilinear
, half of time is spent searching for the face and the second h
alf is spent computing left_hrtf
and right_hrtf
. I've moved the computation of left_hrtf
/right_hrtf
to a separate inline(never)
method in order to be able to see the performance impact).
There are two problems with process_samples
:
- Compiler cannot optimize computing
left_hrtf
/right_hrtf
due to values being pushed instead of assigned. I've tried a few variants in Godbolt: https://rust.godbolt.org/z/jvb4be6P6 and the only way to get proper vectorization is to pre-resize the array (the resize is replaced with memset) and either use iterators+zip or useget_unchecked_mut
. I think iterators+zip is by far a cleaner solution. - Too much time is spent searching for the face which the ray intersects. I've experimented with two solutions. The first is a bit complicated: partition the space with places which pass through each edge of each face and origin (0, 0, 0). This partitioning uniquely identifies a matching face. The additions amount to ~150 lines of code (without tests which, if required, would take another 100-200 lines). The second one is a quick-and-dirty prototype: split space in 8 octets with center in (0, 0, 0), assign all faces to octets (some faces end up in multiple octets) and thus reduce the search space. This is a short algorithms (~80 lines) but not as efficient as the first method. It can however be easily improved by splittiing the space into more partitions.
I've implemented both algorithms, you can see the proposed changes here https://github.com/OlegOAndreev/hrtf/commit/415d1af6dd80bbceb464f7615960c9fd87bcc12e
Cool, I'd like to see a PR :) I think the best option is the one that is less bug-prone.
Opened #5. After some pondering I feel the BSP solution is cleaner and less error-prone.