/libfilter

High-speed Bloom filter variant for C and C++

Primary LanguageHTMLApache License 2.0Apache-2.0

This repository provides implementations of blocked Bloom filters for C, C++, and Java. These filters use slightly more space than traditional Bloom filters but are much faster.

Example usage in C:

#include <filter/block.h>

unsigned ndv = 1000000;
double fpp = 0.0065;
uint64_t hash = 0xfeedbadbee52b055;

libfilter_block filter;

unsigned bytes = libfilter_block_bytes_needed(ndv, fpp);
libfilter_block_init(bytes, &filter);
libfilter_block_add_hash(hash, &filter);
assert(libfilter_block_find_hash(hash, &filter));
libfilter_block_destruct(&filter);

in C++:

#include <filter/block.hpp>

unsigned ndv = 1000000;
double fpp = 0.0065;
uint64_t hash = 0xfeedbadbee52b055;

auto filter = filter::BlockFilter::CreateWithNdvFpp(ndv, fpp);
filter.InsertHash(hash);
assert(filter.FindHash(hash));

in Java

import com.github.jbapple.libfilter.BlockFilter;

int ndv = 1000000;
double fpp = 0.0065;
long hash = (((long) 0xfeedbadb) << 32) | (long) 0xee52b055;

BlockFilter = BlockFilter.CreateWithNdvFpp(ndv, fpp);
filter.AddHash64(hash);
assert filter.FindHash64(hash)

To install:

make
# probably needs a sudo:
make install
make java-world
mvn -f ./java/ install -Dmaven.test.skip=true
[optional: copy java/target/libfilter-*.jar to your classpath]

Block filters are most space-efficient at around 11.5 bits per distinct value, which produces a false positive probability of 0.65%. A traditional Bloom filter would only need 10.5 bits per distinct value to achieve that same false positive probability, while a cuckoo filter would need 10.7 bits per distinct value.