/StreamCardinalityEstimation

Implementation of Cardinality Estimation algorithms on top of Apache Flink. We use two approaches, aggregate function and window slicing.

Primary LanguageJupyter Notebook

StreamCardinality

Implementation of Cardinality Estimation algorithms on top of Apache Flink. We use two approaches, aggregate function and window slicing.

Algorithms to Cover:

  • AdaptiveCounting
  • LogLog 
  • HyperLogLog
  • HyperLogLogPlus
  • LinearCounting
  • CountThenEstimate
  • BJKST
  • FlajoletMartin
  • KMinCount

This project is part of the work at University of Tartu.

Project Plan

https://docs.google.com/spreadsheets/d/1f6ZCpe5v5OReKNgZvxcz88oD-Wjr0PQzARm6kdSgSt4/edit?usp=sharing

Project Presentation

https://drive.google.com/open?id=1NtyDCdAn6-ROgFNVdtxt6RcK7trKXSJ-

References

[1] Carbone, Paris, et al. "Apache flink: Stream and batch processing in a single engine." Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 36.4 (2015).

[2] Traub, Jonas, et al. "Efficient Window Aggregation with General Stream Slicing." (2019).

[3] https://github.com/Elkoumy/streaminer

[4] Harmouch, Hazar, and Felix Naumann. "Cardinality estimation: an experimental survey." Proceedings of the VLDB Endowment 11.4 (2017): 499-512.

[5] Chintapalli, Sanket, et al. "Benchmarking streaming computation engines: Storm, flink and spark streaming." 2016 IEEE international parallel and distributed processing symposium workshops (IPDPSW). IEEE, 2016.