/hll-hive-udf

Approximate cardinality estimation with HyperLogLog, as a Hive function

Primary LanguageJavaApache License 2.0Apache-2.0

An implementation of the HyperLogLog approximate cardinality estimation algorithm (as well as Linear Counting), as a Hive User-defined Aggregation Function (UDAF).

Relies on Clearspring's stream-lib for implementation of the relevant algorithms.

See the original project's Wiki for usage instructions.