/quicksilver

Quicksilver - a library of approximate algorithms and sketches for Rust

Primary LanguageRustApache License 2.0Apache-2.0

Overview

Quicksilver is a library of approximate algorithm and sketches

The algorithms contained within this library are designed to calculate common metrics and statistics (such as cardinality, frequency, etc) in an approximate fashion. In exchange for decreased accuracy, these algorithms typically have minimal memory overhead or are very fast. Or both.

Build status with Rust Nightly: Build Status

##Documentation

API Docs are auto-generated after every commit: http://www.rust-ci.org/polyfractal/quicksilver/doc/quicksilver/

Public modules

HLL - HyperLogLog

Approximates cardinality estimation with minimal memory overhead

This implements HyperLogLog, an algorithm which provides a reasonably accurate estimation of cardinality. It is very fast (200m op/s on my Macbook Air) and requires minimal memory. It can estimate the cardinality of sets that contain billions of entries.

See this article for a good laymen explanation of HyperLogLog.

Usage

use quicksilver::hll::HLL;
use std::hash::Hash;

let mut hll = HLL::new(12);

for i in range(0u, 1000000) {
  let hash = hash::hash(&i);
  hll.offer_hashed(&hash);
}

let cardinality = hll.cardinality();

Precision vs Memory chart

PrecisionSizeWorst Case Estimate Error (+/-)
448 bytes39.00%
560 bytes27.50%
684 bytes19.50%
7136 bytes13.78%
8240 bytes9.75%
9444 bytes6.89%
10852 bytes4.8%
111672 bytes3.44%
123312 bytes2.43%
136588 bytes1.72%
1413140 bytes1.21%
1526248 bytes0.86%
1652464 bytes0.60%
17104892 bytes0.43%
18209748 bytes0.30%

PCSA - Probabilistic Counting with Stochastic Averaging

Implements the Probabilistic Counting with Stochastic Averaging counter. PCSA provides an approximate estimate of cardinality with bounded error. Relative error is 0.78 / sqrt(m)

See this article for a layman's explanation of PCSA

Original paper can be found here

Note: PCSA is generally inferior to HLL in estimation accuracy, memory usage and performance

Usage

use quicksilver::pcsa::PCSA;
use std::hash::Hash;

let mut hll = PCSA::new(10);

for i in range(0u, 1000000) {
  let hash = hash::hash(&i);
  pcsa.offer_hashed(hash);
}

let cardinality = pcsa.cardinality();

Precision vs Memory chart

PrecisionSizeError on 1m Cardinality TestWorst Case Estimate Error (+/-)
480 bytes 0.09% 39.00%
5144 bytes 4.14% 34.88%
6272 bytes 1.18% 31.84%
7528 bytes 3.62% 29.48%
81040 bytes 0.71% 27.57%
92064 bytes 1.51% 26.00%
104112 bytes 0.77% 24.66%
118208 bytes 1.15% 23.51%
1216400 bytes 1.10% 22.51%
1332784 bytes 0.96% 21.63%
1465552 bytes 0.20% 20.84%
15131088 bytes 0.18% 20.13%