/rehashing

fast kernel evaluation in high dimensions via hashing

Primary LanguageC++MIT LicenseMIT

Hashing-Based-Estimators (HBE)

HBE is a C++ library for fast kernel evaluation for high-dimensional data that also includes a python implementation for illustration purposes. HBE uses Locality Sensitive Hashing (LSH) to produce provably accurate estimates of the kernel density for a given query point as well as weighted generalizations thereof. HBE is designed for radially decreasing kernel functions (e.g. Gaussian or Exponential kernels) and truly high dimensional data where FIGTree and Dual-Tree algorithms for fast kernel evaluation are slow.

Currently, HBE supports one LSH family: Eucledian LSH, introduced by (Datar, Immorlica, Indyk, Mirrokni SoCG'04) in the context of solving the Nearest Neighbor Search problem for the Euclidean distance metric (see also the E2LSH package by Alex Andoni).

How to use HBE

The first step to use HBE is to consult the python demo that describes the tuning process. Alternatively you can directly consult the C++ documentation. In our Wiki you can also find descriptions of how to construct tunable synthetic benchmarks for kernel evaluation as well as produce dataset visualizations.

How fast is HBE?

The speed of HBE depends on the desired relative error, the kernel and the dataset. For relative error 0.1 under the Gaussian kernel and datasets around 1M points HBE takes less than 25ms per query and very often around ~2ms. For specific datasets and comparison with competing methods (FIGTree, ASKIT, Random Sampling (RS)) see Table 1.

Table 1

If you want to gain intuition about what properties of datasets affect the performance of kernel evaluation algorithms see our Synthetic Benchmarks page.

Authors

HBE is mainly developed by Kexin Rong and Paris Siminelakis. HBE has grown out of research projects with our collaborators Peter Bailis, Moses Charikar and Philip Levis.

If you want to cite HBE in a publication, here is the bibliographic information of our research papers where the algorithms are described and analyzed:

Rehashing Kernel Evaluation in High Dimensions. Paris Siminelakis, Kexin Rong, Peter Bailis, Moses Charikar, Philip Levis. ICML 2019

Hashing-Based-Estimators for Kernel Density in High Dimensions. Moses Charikar, Paris Siminelakis, FOCS 2017.

License

HBE is available under the MIT License (see LICENSE.txt)