/learned-filters

Experimental work on learning-augmented AMQs

Primary LanguagePython

learned-filters

Experimental work on learning-augmented AMQs

Technical stack

Requires bitarray, mmh3, and torch, all available on pip.

How to use

To learn how to run tests, go to the src directory and run

python test.py -h

This will display a help message that lists all of the available options.

Quick start

Run the following to see the bloom filter in action. (Use toast or sandwich in place of bloom to see how well the other filters do on the default parameters.)

python test.py -f bloom

Approach

Filter architectures

  • Bloom filter
  • Open sandwich bloom filter (Kraska)
  • Sandwiched bloom filter (Mitzenmacher)

Testing setup

The Word type

The Bloom filter needs a hashable input, and the model needs matrix of inputs in [0,1]. To meet both requirements, we use the following approach:

Our abstraction is that we are working with strings of n letters on an alphabet of size C.

The model gets a 1x(Cn) row vector of values in {0, 1} where only one of the scalars in the vector is 1, all others 0. In other words, for the letter with index 0 <= i < C, the vector has 0s everywhere except at location i, which has value 1.

The filter gets an n-element tuple of integers from 0 to C-1, which is hashable.

Note: We can convert decimal numbers to Words by using C=10, converting each number's decimal place to a letter in [0..9]

We handle these conversions by adding a Word type, along with WordNet and WordBloom wrappers for Net and Bloom, respectively.

Estimating model size

We estimate the size of the machine learniing model by counting the number of elements in the two matrices that comprise the model's layers, and then multiply by the size of a float in python (24 bytes).

Hyperparameters

  • heuristic for choosing tau
  • training time/epochs
  • neural net hyperparameters
  • ML model

Lingering questions

  • How much do we train the neural net / pre-filter?

TODO

  • Build overflow Bloom filter
  • Testing rig (setup sending different inputs to model vs filter)
  • Sandwich
  • Tuning tau