/xf8

8-bit Xor Filter in C99

Primary LanguageCThe UnlicenseUnlicense

8-bit Xor Filter

This is a C99 implementation of an 8-bit xor filter. It has a fixed error rate of 1/256 (~0.39%) and uses ~9.84 bits of storage per element. See Xor Filters: Faster and Smaller Than Bloom Filters.

Memory Usage

By default only up to 2^32 elements are supported, which will require 123GB of memory to process. Much larger sets are supported by using -DXF8_64BIT at compile time, at the cost of roughly doubling memory usage. For example, that same set of 2^32 elements will then require a total of 211GB of memory to process.

Despite the library's name, the error rate can be reduced to 1/65,536 (~0.0015%) by using -DXF8_BITS=16 at compile time. This doubles the size of the filter to ~19.68 bits per element.

Example

The example program (tests/example.c) is a probabilistic spell checker:

$ make
$ tests/example </usr/share/dict/words >spelling.db
$ printf 'hello\nfoobarbaz' | tests/example spelling.db
Y hello
N foobarbaz