Count-Min Sketch and Consistent Weighted Sampling implementation
Quick look:–min_sketch
It is a tool for generating Count-Min sketches and consistent weighted samples for varying sized data streams. The implementation has been tested over artificially generated datasets with size upto
- implementations for distance measurements (Jaccard, Hamming, Euclidean, Edit, Manhattan, Cosine).
- Count-Min Sketch tables. Adding stream elements to the table, and fast look-up.
- CWS with settings from original paper. CWS adaptation over Count-Min tables and iterable streams. Sketching CMS tables
- Artificial dataset generation tools, random number generation via uniform, gamma, beta random variables.
Note: Please, install/use C++ 11 or higher. Boost should be installed for some features.