mrDIMAS/hrtf

Optimize HrtfSphere::sample_bilinear

OlegOAndreev opened this issue · 2 comments

When using current HRTF renderer with IRCAM spheres in a system with optimized RustFFT implementation (right now an AVX/SSE implementation and in the nearest future Neon implementation ejmahler/RustFFT#78) most of the time in HrtfProcessor::process_samples is spent in hrtf_sphere.sample_bilinear (see attached profile: 56% percent is spent in sample_bilinear, half of time is spent searching for the face and the second h
alf is spent computing left_hrtf and right_hrtf. I've moved the computation of left_hrtf/right_hrtf to a separate inline(never) method in order to be able to see the performance impact).
Current-profile

There are two problems with process_samples:

  1. Compiler cannot optimize computing left_hrtf/right_hrtf due to values being pushed instead of assigned. I've tried a few variants in Godbolt: https://rust.godbolt.org/z/jvb4be6P6 and the only way to get proper vectorization is to pre-resize the array (the resize is replaced with memset) and either use iterators+zip or use get_unchecked_mut. I think iterators+zip is by far a cleaner solution.
  2. Too much time is spent searching for the face which the ray intersects. I've experimented with two solutions. The first is a bit complicated: partition the space with places which pass through each edge of each face and origin (0, 0, 0). This partitioning uniquely identifies a matching face. The additions amount to ~150 lines of code (without tests which, if required, would take another 100-200 lines). BSP-profile The second one is a quick-and-dirty prototype: split space in 8 octets with center in (0, 0, 0), assign all faces to octets (some faces end up in multiple octets) and thus reduce the search space. This is a short algorithms (~80 lines) but not as efficient as the first method. It can however be easily improved by splittiing the space into more partitions. Octants-profile

I've implemented both algorithms, you can see the proposed changes here https://github.com/OlegOAndreev/hrtf/commit/415d1af6dd80bbceb464f7615960c9fd87bcc12e

Cool, I'd like to see a PR :) I think the best option is the one that is less bug-prone.

Opened #5. After some pondering I feel the BSP solution is cleaner and less error-prone.