/basiccms.rs

A simple count-min-sketch in rust.

Primary LanguageRustOtherNOASSERTION

basic count-min sketch

It's the count-min sketch structure implemented in rust.

You can use it to count frequencies of events with some degree of uncertainty. It uses sub-linear space, but allows for possible hash collisions that could lead to over counting.

Conceptually, it's like a bloom filter, but for multi-sets rather than strict sets.

In order to use it, you need to provide an implementation of the trait basiccms::IntoSketch that describes how to turn some type into the underlying u64 used by the hash buckets.

Ostensibly we could just use an implementation like this:

impl<T: Hash> IntoSketch for T {
    fn asu64(&self) -> u64 {
        let mut hasher = SipHasher::new();
        self.hash(&mut hasher);

        hasher.finish()
    }
}

But that precludes occasions where we might not want an extra hashing pass, and there's an "obvious" way to convert to a u64 (for example a u32 might just get widened...). It's not defined in the crate itself, because so many types implement Hash it would cause annoying collisions, but no reason you can't add it yourself in another scope.

Once you've actually got a method of putting your data into the sketch by implementing the trait, you can use it heterogenuously if you feel like it (unlike, say sketchy):

extern crate basiccms;

use basiccms::Sketch;

#[test]
fn we_should_be_able_to_add_heterogenuously () {
        let mut sketch = Sketch::new(0.0001, 0.8);

        sketch.add(1); sketch.add(2); sketch.add(3);
        sketch.add("foo"); sketch.add("bar"); sketch.add("quux");

        assert_eq!(1, sketch.point("foo"));
        sketch.add("foo");

        assert_eq!(2, sketch.point("foo"));
}

At the moment all that's offered is the sketch.point point query method, which returns the class "min count" frequency estimation.

Patches welcome!