/BloomFilter

Go package implementing Bloom filter

Primary LanguageGoMIT LicenseMIT

Bloom-Filter

GitHub license GoDoc

A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set.

An empty Bloom filter is a bit array of m bits, all set to 0. There must also be k different hash functions defined, each of which maps or hashes some set element to one of the m array positions, generating a uniform random distribution.

m -> size of bit array
k -> number of hash function
n -> estimated number of element to add in bloom filter

False positive matches are possible, but false negatives are not – in other words, a query returns either "possibly in set" or "definitely not in set".

Supported Operations

  insert(x) : To insert an element in the Bloom Filter.
  lookup(x) : to check whether an element is already present 
              in Bloom Filter with a false positive probability.

NOTE: Delete operation can be supported by using Counting Bloom filter.

Install

go get github.com/akgarhwal/bloomfilter

Usage

// create a new bloom filter 
// with 100 elements to be inserted and 
// false positive probability of 0.1
bf := bloomfilter.NewBloomFilter(100, 0.1)
bf.Insert([]byte("sachin"))
bf.Insert([]byte("rohit"))
bf.Insert([]byte("virat"))
bf.Insert([]byte("msdhoni"))

// new user name to check
newUserName := "msdhoni"

if bf.Lookup([]byte(newUserName)) {
  fmt.Println(newUserName, " is present")
} else {
  fmt.Println(newUserName, " is NOT present")
}

Probability of False Positive

If m is size of bit array and n is number of elements to be inserted and k is number of hash functions, then the probability of false positive ε can be calculated as:
Screenshot 2021-09-14 at 1 56 10 PM

Size of Bit Array:

If expected number of elements n is known and desired false positive probability is ε then the size of bit array:
Screenshot 2021-09-14 at 2 00 54 PM

Optimal number of hash functions

The number of hash functions k must be a positive integer. If m is size of bit array and n is number of elements to be inserted, then k can be calculated as:
Screenshot 2021-09-14 at 2 05 38 PM

License

This package is licensed under MIT license. See LICENSE for details.