-
The project is assignment for master course Advances in Data Mining and it is based on probabilistic methods of counting distinct elements from different hash.
-
Scripts: LogLog_Counting Algorithms , Probabilistic_Counting Algorithm and Trailing_Zeros Algorithm
-
Data was generated randomly simulating the result of a hash function, so they are binary bitstreams uniformly distributed.
-
Read ||"advances-data-mining.pdf"|| for further information.
-
Hyper_LogLog Algorithm --> Hyperloglog: The analysis of a near-optimal cardinality estimation algorithm .
-
LogLog-Beta --> LogLog-Beta and More: A New Algorithm for Cardinality Estimation Based on LogLog Counting .