This repository contains a C++11 implementation of the data structure described in Section 4.1 of the paper
- G. Cormode, F. Korn, S. Muthukrishnan, and D. Srivastava. Space- and time-efficient deterministic algorithms for biased quantiles over data streams. In ACM Principles of Database Systems (PODS), 2006.
The intended use-case is that we have a large stream of integers and we want to efficiently approximate rank and quantile queries for the stream without storing the whole stream.
The data structure supports the following operations:
Preconditions: 0 < b ≤ 31 and 0 < ε ≤ 0.5.
b is the number of bits in the universe and U = 2b is the size of the universe. In this implementation U must be less than 231 (i.e., 31 bit integers).
ε is a parameter which controls the trade-off between space usage and accuracy. The space used by the data structure is bounded by O(log(U) · 1/ε · (log(ε N) + log(U))). See rank and quantile below for the accuracy guarantees.
Precondition: x is an integer such that 0 ≤ x < U.
Insert x. Amortized time complexity: O(log(U)).
Precondition: x is an integer such that 0 ≤ x < U.
Compute the approximate rank of x. Let true_rank(x) = number of inserted items y such that y < x. Then it is guaranteed that abs(rank(x) - true_rank(x)) < ε · true_rank(x).
Precondition: 0 ≤ p ≤ 1.
Compute an approximate p quantile. It is guaranteed that
true_rank(x) · (1 − ε) ≤ pN ≤ true_rank(x + 1) · (1 + ε)
where x = quantile(p) and N is the number of items that have been inserted. quantile(p) is implemented by binary searching for a suitable x using rank(·).
The code for the data structure is contained in SbQuantiles.h
,
SbQuantiles.cpp
, and TreeNode.h
.
Any references to theorems, lemmas, definitions, equations, figures, and sections in comments in the code refers to the corresponding piece of text in [1].
A fairly small example which show how SbQuantiles can be used.
Used for benchmarking.
test.cpp
and test-big.cpp
Tests.
The MIT license, see LICENSE.txt
for details.
Copyright (c) 2015 Fredrik Kuivinen, frekui@gmail.com.