/quantile

an aggregation function for PostgreSQL

Primary LanguageCOtherNOASSERTION

Quantile aggregates

This extension provides three simple aggregate functions to compute quantiles (http://en.wikipedia.org/wiki/Quantile). There are two forms of aggregate functions available - the first one returns a single quantile, the second one returns an arbitrary number of quantiles (as an array).

History

This extension was created in 2011, before PostgreSQL added functions to compute percentiles (percentile_cont and percentile_disc) in 9.4, which was released in December 2014. Even after introduction of those built-in functions it made sense to use this extension, because it was significantly faster in various cases.

The performance of the built-in functions improved a lot since then, and is usually very close or even faster than this extension. In some cases the extension is perhaps 2x faster than the built-in functions, but that may be (at least partially) attributed to not respecting work_mem.

It's therefore recommended to evaluate the built-in functions first, and only use this extension if it's provably (and consistently) faster than the built-in functions and the risk of running out of memory is low, or when it's necessary to support older PostgreSQL releases (pre-9.4) that do not have the built-in alternatives.

quantile(p_value numeric, p_quantile float)

Computes arbitrary quantile of the values - the p_quantile has to be between 0 and 1. For example this should return 500 because 500 is the median of a sequence 1 .. 1000.

SELECT quantile(i, 0.5) FROM generate_series(1,1000) s(i);

but you can choose arbitrary quantile (for example 0.95).

This function is overloaded for the four basic numeric types, i.e. int, bigint, double precision and numeric.

quantile(p_value numeric, p_quantiles float[])

If you need multiple quantiles at the same time (e.g. all four quartiles), you can use this function instead of the one described above. This version allows you to pass an array of quantiles and returns an array of values.

So if you need all three quartiles, you may do this

SELECT quantile(i, ARRAY[0.25, 0.5, 0.75])
  FROM generate_series(1,1000) s(i);

and it should return ARRAY[250, 500, 750]. Compared to calling the simple quantile function like this

SELECT quantile(i, 0.25), quantile(i, 0.5), quantile(i, 0.75)
 FROM generate_series(1,1000) s(i);

the advantage is that the values are collected just once (into a single array), not for each expression separately. If you're working with large data sets, this may save a significant amount of time and memory (if may even be the factor that allows the query to finish and not being killed by OOM killer or something).

Just as in the first case, there are four functions handling other basic numeric types: int, bigint, double precision and numeric.

Installation

Installing this is very simple, especially if you're using pgxn client. All you need to do is this:

$ pgxn install quantile
$ pgxn load -d mydb quantile

and you're done. You may also install the extension manually:

$ make install
$ psql dbname -c "CREATE EXTENSION quantile"

And if you're on an older version (pre-9.1), you have to run the SQL script manually

$ psql dbname < `pg_config --sharedir`/contrib/quantile--1.1.7.sql

That's all.

License

This software is distributed under the terms of BSD 2-clause license. See LICENSE or http://www.opensource.org/licenses/bsd-license.php for more details.