/pydata2013-pds-talk

Probabilistic Data Structures for Realtime Analytics (PyData 2013)

Primary LanguagePython

Probabilistic Data Structures for Realtime Analytics (PyData 2013)

More and more applications are now dealing with massive data that need to be processed in realtime. While easing the development of realtime analytics applications, computing platforms like Storm increases the need for efficient algorithms that can run on a single pass on the data stream. In this talk, I'll give a brief overview of some interesting probabilistic data structures that can used in this context: Bloomfilter, Temporal Bloomfilter, Count-Min Sketch and HyperLogLog.