starkdg/hftrie

Performance idea: chunk values in leaf nodes and do linear lookup in that

Opened this issue · 0 comments

Hi there,

I'm opening this issue as a follow-up to your comment on HN a few months ago.

My own cppbktree Python module implements a straight-forward BK-tree, which I found to be too slow, especially when trying to look up millions of hashes in a database of billions. Exact lookup is not a problem but lookup with a distance of around 10-16 out of 64, which I would need, is.

I found some motivation to revisit this again because I was wondering how fast I could perform with a simple linear lookup maybe even using SIMD because in the case of large distances, it seemed that almost all BK tree nodes would be visited anyway. It turns out that my hand-crafted AVX SIMD loop is 10% slower than a simple loop over 64-bit values using xor and popcnt. But even that loop is ~300x faster than the BK tree version for large distances and large datasets. The following figure shows the simple linear lookup in solid lines competing with the BK tree. The BK-tree can only compete for very short distances.

compare-scalings-cppbktree-linear-lookup

The idea now would be to mix a BK tree with linear lookup by implementing a kind of chunking per leaf node. This way, I can combine the 1000x faster speed for large distances with the fast BK-tree speed for near-exact lookups. This works quite well:

compare-scalings-cppbktree-vs-chunked

I would be very interested if this same idea can be applied to your code. It has a CHUNKSIZE parameter but it seems to be something different. In the end, it should improve performance if there are leaf nodes containing around 10-100k elements that are then linearly searched. (I'm not sure about this but I think this would correspond to a chunk size of 16-19 bits, which is what your CHUNKSIZE variable represents, if I understand it correctly. But that value would have to be only applied to the leaf nodes while the intermediary nodes keep using a CHUNKSIZE of 4.)