Countish implements two approximate counting algorithms outlined in “Approximate Frequency Counts over Data Streams”.
http://www.vldb.org/conf/2002/S10P03.pdf
Currently it only works with strings. The public API isn’t fixed, I expect to support other hashable types. Once I learn a little more rust ; )
See this blog post for some more background on counting and the go implementation: http://whitane.com/post/how-to-countish/
The rust impl is actually much faster than the go version. Which given that I know lots of go and almost no rust is pretty exciting.
Have you ever needed to do something like calculate the top URLs or top ips from an infinite stream? This package provides probabalistic frequency counters, with accuracy guarantees and low memory usage.
countish provides an extremely simple interface consisting of an “observe” method and a “items_above_threshold” method.
Example:
cargo build --release
cat urls.txt | ./target/release/rcountish --threshold 0.3
/ 0.42867142857142854
3 counting implementations are provided.
- Naive: exact counts are held in a map
- Lossy: corresponding to “lossy counting”
- Sticky: corresponding to “sticky sampling”