gakhov/pdsa

Implement Random Sampling

gakhov opened this issue · 0 comments

The Random sampling algorithm, often referred to as MRL, was published by Gurmeet Singh Manku, Sridhar Rajagopalan, and Bruce Lindsay in 1999 [1] and addressed the problem of the correct sampling and quantile estimation. It consists of the non-uniform sampling technique and deterministic quantile finding algorithm.

I suggest implementing a simpler version of the MRL algorithm that was proposed by Ge Luo, Lu Wang, Ke Yi, and Graham Cormode in 2013 [2], [3], and denoted in the original articles as Random.

References

[1] Manku, G., et al: Random sampling techniques for space efficient
online computation of order statistics of large datasets. Proceedings
of the 1999 ACM SIGMOD International conference on Management
of data, Philadelphia, Pennsylvania, USA - May 31–June 03, 1999,
pp. 251–262, ACM New York, NY (1999)
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.86.5750&rep=rep1&type=pdf
[2] Wang, L., et al: Quantiles over data streams: an experimental
study. Proceedings of the 2013 ACM SIGMOD International
conference on Management of data, New York, NY, USA - June
22–27, 2013, 2013, pp. 737–748, ACM New York, NY (2013)
http://dimacs.rutgers.edu/~graham/pubs/papers/nquantiles.pdf
[3] Luo, G., Wang, L., Yi, K. et al.: Quantiles over data streams:
experimental comparisons, new analyses, and further improvements.
The VLDB Journal. Vol. 25 (4), 449–472 (2016)
http://dimacs.rutgers.edu/~graham/pubs/papers/nquantvldbj.pdf