A recent Incremental version of the ideas in this code appears in Prefix-Filter repository. The Prefix-Filter is an incremental filter, with faster insertions throughput than Bloom, CF and VQF, and query throughput which is slightly worse than the CF with the same false positive rate (which is faster than others ''hashtable for fingerprints'' filters).
Currently benchmarking:
- Bloomfilter (BF)
- Cuckoo filter (CF)
- SIMD blocked Bloom filter (SIMD)
- Morton filter (MF)
- Pocket Dictionary filter (PD)
The files test.hpp test.cpp
contain validation tests on the filter.
- Making sure the filter does not have a false negative. (Indicating the element is in not in the filter when it is.)
- Checking the filter false positive rate is as expected.
Filter often have a parameter controlling on the false positive probability
$\epsilon$ , when it is increased, the filter uses more space, and has smaller error probability.
There are various benchmark to evaluate the error probability under differents loads, and speed test by four paramaters: Insertions, uniform lookup (uniform lookup result is "no" w.h.p. in standard scenarios), True-lookup (of elements in the filter) and Deletions.
Since CF uses openssl
library, the project won't compile unless it is installed. (See CF repository)
git clone -b Simpler https://github.com/TheHolyJoker/Comparing_Filters.git
cd Comparing_Filters
mkdir build
cd build
cmake..
cmake --build ./ --target Filters
In build
directory run
./Filters <filter indicator> <exponent of number of keys> <lookup factor> <rounds>
-
filter indicator
: Which filter to test.- To include BF in the test,
filter indicator & 1
should be true. - To include CF in the test,
filter indicator & 2
should be true. - To include SIMD in the test,
filter indicator & 4
should be true. - To include MF in the test,
filter indicator & 8
should be true. - To include PD in the test,
filter indicator & 16
should be true.
The default value is -1 to test all filters.
- To include BF in the test,
-
exponent of the number of keys
: Every filter is built to contain at most 2^exponent of the number of keys
.
The default value is 24. (should not be set to less than 16 or MF might fail) -
lookup factor
: Lookup exponent factor. If set to d and n insertions will be performed, then n*2^d lookups will be performed.
The default value is 2 -
rounds
: The benchmark performs insertion, and then lookup where each time a fraction of1/rounds
of the total number of elements is queried.
The default value is 32.
Large parts of the code and its structure are taken from https://github.com/FastFilter/fastfilter_cpp.
Cuckoo filter is from https://github.com/efficient/cuckoofilter by Bin Fan et al.
SIMD blocked Bloom filter is from https://github.com/apache/impala.
Morton filter is from https://github.com/AMDComputeLibraries/morton_filter.
Counting Quotient Filter (CQF) is from https://github.com/splatlab/cqf. (Currently not in use).
Pocket Dictionary is work in progress see https://github.com/TomerEven/Pocket_Dictionary. The Pocket Dictionary class that uses advanced SIMD instructions, is taken from here by Jim Apple (@Jbapple).
Bloom filter https://en.wikipedia.org/wiki/Bloom_filter
Cuckoo Filter "Cuckoo Filter: Better Than Bloom" by Bin Fan, Dave Andersen and Michael Kaminsky
SIMD blocked Bloom filter Cache-, Hash- and Space-Efficient Bloom Filters
Morton filter Morton Filters: Faster, Space-Efficient Cuckoo Filters via Biasing, Compression, and Decoupled Logical Sparsity,
Quotient Filter A General-Purpose Counting Filter: Counting Quotient Filter (CQF)
Pocket Dictionary Fully-Dynamic Space-Efficient Dictionaries and Filters with Constant Number of Memory Accesses
- Add Filters
- Vacuum-Filter paper repository
- Quotient-Filter repository
- Counting filter benchmark.