A Rust library for counting distinct elements in a stream, using ClickHouse uniq data structure.
This uses BJKST, a probabilistic algorithm that relies on adaptive sampling and provides fast, accurate and deterministic results. Two BJKSTs can be merged, making the data structure well suited for map-reduce settings.
use uniq_ch::Bjkst;
let mut bjkst = Bjkst::new();
// Add some elements, with duplicates.
bjkst.extend(0..75_000);
bjkst.extend(25_000..100_000);
// Count the distinct elements.
assert!((99_000..101_000).contains(&bjkst.len()));