This Jupyter Notebook is an introduction to common probabilistic algorithms that are usually found in relational databases (for query optimization) and in the analysis of data streams. The algorithms are: Approximate Counting (for estimating the size of a multiset), Flajolet-Martin, LogLog, HyperLogLog (for estimating the cardinality of a multiset) and Bloom Filters (query membership). This introduction not only describes these algorithms, but also implements them in Python and provides a visual analysis of their performance.
lucasschmidtc/Probabilistic-Algorithms
Introduction to common Probabilistic Algorithms: Approximate Counting, Flajolet-Martin, LogLog, HyperLogLog, Bloom Filters
Jupyter Notebook