FastFilter/xorfilter

Streaming use cases

gurre opened this issue · 2 comments

gurre commented

Great algorithm you got there. The API requires a slice to be passed in to populate. If I'm interested in adding entities on the fly without having to load/allocate my entire dataset, would this be possible?

Unfortunately the construction of an individual xor filter is "all or nothing". What would work is to use a buffer to store a list of 64 bit hash values. (If the buffer is large, there might be a few duplicates, but that wouldn't be a problem as we anyway have to deal with false positives.) Once the buffer is full, an xor filter can be built from that. In case you want to stream more entries than fit in a buffer, I wonder if it makes sense to combine multiple xor filters. Lookup would be slower. An alternative is to use temp files to sort the keys (they don't need to be fully sorted; approximate sorting is fine).

@gurre You need to pass a slice of 64-bit values, so basically hash values. It is likely that if you cannot afford 8 bytes per entry, building the filter in RAM will prove challenging.

I am going to close this issue.