Quicksilver is a library of approximate algorithm and sketches
The algorithms contained within this library are designed to calculate common metrics and statistics (such as cardinality, frequency, etc) in an approximate fashion. In exchange for decreased accuracy, these algorithms typically have minimal memory overhead or are very fast. Or both.
Build status with Rust Nightly:
##Documentation
API Docs are auto-generated after every commit: http://www.rust-ci.org/polyfractal/quicksilver/doc/quicksilver/
Approximates cardinality estimation with minimal memory overhead
This implements HyperLogLog, an algorithm which provides a reasonably accurate estimation of cardinality. It is very fast (200m op/s on my Macbook Air) and requires minimal memory. It can estimate the cardinality of sets that contain billions of entries.
See this article for a good laymen explanation of HyperLogLog.
use quicksilver::hll::HLL;
use std::hash::Hash;
let mut hll = HLL::new(12);
for i in range(0u, 1000000) {
let hash = hash::hash(&i);
hll.offer_hashed(&hash);
}
let cardinality = hll.cardinality();
Precision | Size | Worst Case Estimate Error (+/-) |
4 | 48 bytes | 39.00% |
5 | 60 bytes | 27.50% |
6 | 84 bytes | 19.50% |
7 | 136 bytes | 13.78% |
8 | 240 bytes | 9.75% |
9 | 444 bytes | 6.89% |
10 | 852 bytes | 4.8% |
11 | 1672 bytes | 3.44% |
12 | 3312 bytes | 2.43% |
13 | 6588 bytes | 1.72% |
14 | 13140 bytes | 1.21% |
15 | 26248 bytes | 0.86% |
16 | 52464 bytes | 0.60% |
17 | 104892 bytes | 0.43% |
18 | 209748 bytes | 0.30% |
Implements the Probabilistic Counting with Stochastic Averaging counter. PCSA provides an approximate estimate of cardinality with bounded error. Relative error is 0.78 / sqrt(m)
See this article for a layman's explanation of PCSA
Original paper can be found here
Note: PCSA is generally inferior to HLL in estimation accuracy, memory usage and performance
use quicksilver::pcsa::PCSA;
use std::hash::Hash;
let mut hll = PCSA::new(10);
for i in range(0u, 1000000) {
let hash = hash::hash(&i);
pcsa.offer_hashed(hash);
}
let cardinality = pcsa.cardinality();
Precision | Size | Error on 1m Cardinality Test | Worst Case Estimate Error (+/-) |
4 | 80 bytes | 0.09% | 39.00% |
5 | 144 bytes | 4.14% | 34.88% |
6 | 272 bytes | 1.18% | 31.84% |
7 | 528 bytes | 3.62% | 29.48% |
8 | 1040 bytes | 0.71% | 27.57% |
9 | 2064 bytes | 1.51% | 26.00% |
10 | 4112 bytes | 0.77% | 24.66% |
11 | 8208 bytes | 1.15% | 23.51% |
12 | 16400 bytes | 1.10% | 22.51% |
13 | 32784 bytes | 0.96% | 21.63% |
14 | 65552 bytes | 0.20% | 20.84% |
15 | 131088 bytes | 0.18% | 20.13% |