/quantiles

Data structures for computing quantiles efficiently.

Primary LanguageC++MIT LicenseMIT

What is this?

This repository contains a C++11 implementation of the data structure described in Section 4.1 of the paper

  1. G. Cormode, F. Korn, S. Muthukrishnan, and D. Srivastava. Space- and time-efficient deterministic algorithms for biased quantiles over data streams. In ACM Principles of Database Systems (PODS), 2006.

The intended use-case is that we have a large stream of integers and we want to efficiently approximate rank and quantile queries for the stream without storing the whole stream.

The data structure supports the following operations:

construction(b, ε)

Preconditions: 0 < b ≤ 31 and 0 < ε ≤ 0.5.

b is the number of bits in the universe and U = 2b is the size of the universe. In this implementation U must be less than 231 (i.e., 31 bit integers).

ε is a parameter which controls the trade-off between space usage and accuracy. The space used by the data structure is bounded by O(log(U) · 1/ε · (log(ε N) + log(U))). See rank and quantile below for the accuracy guarantees.

insert(x)

Precondition: x is an integer such that 0 ≤ x < U.

Insert x. Amortized time complexity: O(log(U)).

rank(x)

Precondition: x is an integer such that 0 ≤ x < U.

Compute the approximate rank of x. Let true_rank(x) = number of inserted items y such that y < x. Then it is guaranteed that abs(rank(x) - true_rank(x)) < ε · true_rank(x).

quantile(p)

Precondition: 0 ≤ p ≤ 1.

Compute an approximate p quantile. It is guaranteed that

      true_rank(x) · (1 − ε) ≤ pN ≤ true_rank(x + 1) · (1 + ε)

where x = quantile(p) and N is the number of items that have been inserted. quantile(p) is implemented by binary searching for a suitable x using rank(·).

Structure of the code

The code for the data structure is contained in SbQuantiles.h, SbQuantiles.cpp, and TreeNode.h.

Any references to theorems, lemmas, definitions, equations, figures, and sections in comments in the code refers to the corresponding piece of text in [1].

A fairly small example which show how SbQuantiles can be used.

Used for benchmarking.

Tests.

License

The MIT license, see LICENSE.txt for details. Copyright (c) 2015 Fredrik Kuivinen, frekui@gmail.com.