/Probabilistic-Algorithms

Introduction to common Probabilistic Algorithms: Approximate Counting, Flajolet-Martin, LogLog, HyperLogLog, Bloom Filters

Primary LanguageJupyter Notebook

Probabilistic Algorithms

This Jupyter Notebook is an introduction to common probabilistic algorithms that are usually found in relational databases (for query optimization) and in the analysis of data streams. The algorithms are: Approximate Counting (for estimating the size of a multiset), Flajolet-Martin, LogLog, HyperLogLog (for estimating the cardinality of a multiset) and Bloom Filters (query membership). This introduction not only describes these algorithms, but also implements them in Python and provides a visual analysis of their performance.