/H2_ALSH

Accurate and Fast ALSH for Maximum Inner Product Search (KDD 2018)

Primary LanguageC++GNU General Public License v3.0GPL-3.0

H2_ALSH: Homocentric Hypersphere Asymmetric Locality-Sensitive Hashing

drawing

Introduction

This package provides several LSH-base schemes for k-Maximum Inner Product Search (k-MIPS). It is also an implementation of H2_ALSH from the paper as follows:

Qiang Huang, Guihong Ma, Jianlin Feng, Qiong Fang, and Anthony K. H. Tung. Accurate and Fast
Locality-Sensitive Hashing Scheme for Maximum Inner Product Search. In Proceedings of the 24th
ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining, pages 1561-1570, 2018.

Compilation

The package requires g++ with c++11 support. To download and compile the code, type:

git clone git@github.com/HuangQiang/H2_ALSH.git
cd H2_ALSH
make

Datasets

We use four real-life datasets Sift, Gist, Netflix, and Yahoo for comparison. The statistics of the datasets are summarized in the following table:

Datasets #Objects #Queries Dimensionality Data Size
Sift 1,000,000 1,000 128 512.0 MB
Netflix 17,770 1,000 300 21.3 MB
Yahoo 624,961 1,000 300 750.0 MB
Gist 1,000,000 1,000 960 3.8 GB

Run Experiments

Usage: alsh [OPTIONS]

This package supports 12 options to evaluate the performance of H2_ALSH, L2_ALSH,
L2_ALSH2, XBOX, Sign_ALSH, Simple_LSH and Linear_Scan for k-MIPS. The parameters
are introduced as follows.

  -alg    integer    options of algorithms (0 - 11)
  -n      integer    cardinality of dataset
  -d      integer    dimensionality of dataset and query set
  -qn     integer    number of queries
  -K      integer    number of hash tables for Sign_ALSH and Simple_LSH
  -m      integer    extra dimension for L2_ALSH, L2_ALSH2, and Sign_ALSH
  -U      float      a value in (0,1] for L2_ALSH, L2_ALSH2, and Sign_ALSH
  -c0     float      approximation ratio for NN Search (c0 > 1)
  -c      float      approximation ratio for MIP Search (0 < c < 1)
  -ds     string     address of data  set
  -qs     string     address of query set
  -ts     string     address of truth set
  -op     string     output path

We provide all scripts to repeat all experiments reported in SIGKDD 2018. A quick example is shown as follows (run H2_ALSH on Mnist):

./alsh -alg 1 -n 60000 -qn 1000 -d 50 -c0 2.0 -c 0.5 -ds data/Mnist/Mnist.ds -qs data/Mnist/Mnist.q -ts data/Mnist/Mnist.mip -op results/Mnist/

If you would like to get more information to run other algorithms, please check the scripts in the package. When you run the package, please ensure that the path for the dataset, query set, and truth set is correct. Since the package will automatically create folder for the output path, please keep the path as short as possible.

Related Publication

If you use this package for publications, please cite the paper as follows.

@inproceedings{huang2018accurate,
    title={Accurate and Fast Asymmetric Locality-Sensitive Hashing Scheme for Maximum Inner Product Search}
    author={Huang, Qiang and Ma, Guihong and Feng, Jianlin and Fang, Qiong and Tung, Anthony KH},
    booktitle={Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining},
    pages={1561--1570},
    year={2018},
    organization={ACM}
}