Hyperminhash for SQLite3
A Hyperminhash-extension for SQLite3, providing very fast, constant-memory cardinality approximation, including intersection- and union-operations.
... Query on an in-memory table of two million (INT, INT)-rows, no index
Query | Result | Time |
---|---|---|
SELECT COUNT(*) FROM (SELECT DISTINCT foo, bar FROM foobar) |
1,734,479 | 5028ms |
SELECT hyperminhash(foo, bar) FROM foobar |
1,728,632 (error 0.34%) | 337ms (x14.9) |
The extensions provides the following functions
-
HYPERMINHASH()
, an aggregate-function accepting up toSQLITE_LIMIT_FUNCTION_ARG
arguments; returns the approximate cardinality of the items seen as aDOUBLE
.E.g.
SELECT HYPERMINHASH(users.date, users.ip) AS unique_users FROM users;
-
HYPERMINHASH_ZERO()
, a scalar-function accepting no arguments; returns a opaqueBLOB
representing a count of zero.E.g.
INSERT INTO stats (data_point, hmh_data) VALUES ('users', HYPERMINHASH_ZERO());
-
HYPERMINHASH_SERIALIZE()
, an aggregate-function similar toHYPERMINHASH()
. Returns a opaqueBLOB
representing the approximate cardinality of the items seen.E.g.
UPDATE stats SET stats.hmh_data = (SELECT HYPERMINHASH_SERIALIZE(users.date, users.ip) FROM users) WHERE stats.data_point = 'users';
-
HYPERMINHASH_DESERIALIZE()
, a scalar-function accepting a singleBLOB
returned byHYPERMINHASH_ZERO()
,HYPERMINHASH_SERIALIZE()
,HYPERMINHASH_ADD()
orHYPERMINHASH_UNION()
. Returns the approximate cardinality as aDOUBLE
.E.g.
SELECT HYPERMINHASH_DESERIALIZE(stats.hmh_data) FROM stats WHERE stats.data_point = 'users';
-
HYPERMINHASH_UNION()
, an aggregate-function acceptingBLOB
s returned byHYPERMINHASH_ZERO()
,HYPERMINHASH_SERIALIZE()
,HYPERMINHASH_ADD()
orHYPERMINHASH_UNION()
. Returns an opaqueBLOB
representing the union-set operation over it's inputs.E.g.
SELECT HYPERMINHASH_UNION(stats.hmh_data) FROM stats WHERE stats.data_point = 'users' AND result = 'error';
-
HYPERMINHASH_ADD()
, a scalar-function accepting up toSQLITE_LIMIT_FUNCTION_ARG
, equivalent toHYPERMINHASH_UNION()
.E.g.
UPDATE stats SET stats.hmh_data = HYPERMINHASH_ADD(stats.hmh_data, (SELECT HYPERMINHASH_SERIALIZE(users.date, users.ip) FROM users WHERE users.date = DATE('now'))) WHERE stats.data_point = 'users';
-
HYPERMINHASH_INTERSECTION()
, a scalar-function accepting exactly twoBLOB
s returned byHYPERMINHASH_ZERO()
,HYPERMINHASH_SERIALIZE()
,HYPERMINHASH_ADD()
orHYPERMINHASH_UNION()
. Returns the approximate cardinality of the intersection-set operation over it's arguments as aDOUBLE
.E.g.
SELECT HYPERMINHASH_INTERSECTION((SELECT stats.hmh_data FROM stats WHERE stats.data_point = 'users'), (SELECT stats.hmh_data FROM stats FROM stats WHERE stats.data_point = 'admins'));
Building
Use Rust's package manager via cargo build --release
. A shared object file for the current platform will be placed in target/release
.
By default, only the HYPERMINHASH()
-function is available. Compile the crate with the serialize
-feature to enable the other functions, which return a static error if the serialize
-feature was not activated.