Streaming use cases
gurre opened this issue · 2 comments
Great algorithm you got there. The API requires a slice to be passed in to populate. If I'm interested in adding entities on the fly without having to load/allocate my entire dataset, would this be possible?
Unfortunately the construction of an individual xor filter is "all or nothing". What would work is to use a buffer to store a list of 64 bit hash values. (If the buffer is large, there might be a few duplicates, but that wouldn't be a problem as we anyway have to deal with false positives.) Once the buffer is full, an xor filter can be built from that. In case you want to stream more entries than fit in a buffer, I wonder if it makes sense to combine multiple xor filters. Lookup would be slower. An alternative is to use temp files to sort the keys (they don't need to be fully sorted; approximate sorting is fine).