Experimental work on learning-augmented AMQs
Requires bitarray
, mmh3
, and torch
, all available on pip.
To learn how to run tests, go to the src
directory and run
python test.py -h
This will display a help message that lists all of the available options.
Run the following to see the bloom filter in action.
(Use toast
or sandwich
in place of bloom
to see how well the other filters do on the default parameters.)
python test.py -f bloom
- Bloom filter
- Open sandwich bloom filter (Kraska)
- Sandwiched bloom filter (Mitzenmacher)
The Bloom filter needs a hashable input, and the model needs matrix of inputs in [0,1]. To meet both requirements, we use the following approach:
Our abstraction is that we are working with strings of n
letters on an alphabet of size C
.
The model gets a 1x(Cn)
row vector of values in {0, 1}
where only one of the scalars in the vector is 1, all others 0.
In other words, for the letter with index 0 <= i < C
, the vector has 0s everywhere except at location i
, which has value 1.
The filter gets an n
-element tuple of integers from 0
to C-1
, which is hashable.
Note: We can convert decimal numbers to Word
s by using C=10, converting each number's decimal place to a letter in [0..9]
We handle these conversions by adding a Word type, along with WordNet and WordBloom wrappers for Net and Bloom, respectively.
We estimate the size of the machine learniing model by counting the number of elements in the two matrices that comprise the model's layers, and then multiply by the size of a float in python (24 bytes).
- heuristic for choosing tau
- training time/epochs
- neural net hyperparameters
- ML model
- How much do we train the neural net / pre-filter?
- Build overflow Bloom filter
- Testing rig (setup sending different inputs to model vs filter)
- Sandwich
- Tuning tau