/RQALSH_Mem

Memory Version of RQALSH for Furthest Neighbor Search (TKDE 2017)

Primary LanguageC++GNU General Public License v3.0GPL-3.0

RQALSH_Mem: Memory Version of RQALSH

Introduction

This package provides two internal LSH schemes RQALSH and RQALSH* for high-dimensional c-Approximate Furthest Neighbor (c-AFN) search from the following two papers:

Qiang Huang, Jianlin Feng, Qiong Fang. Reverse Query-Aware Locality-Sensitive
Hashing for High-Dimensional Furthest Neighbor Search. 2017 IEEE 33rd International 
Conference on Data Engineering (ICDE), pages 167-170, 2017.

Qiang Huang, Jianlin Feng, Qiong Fang, Wilfred Ng. Two Efficient Hashing Schemes for 
High-Dimensional Furthest Neighbor Search. IEEE Transactions on Knowledge and Data 
Engineering (TKDE), 29(12): 2772–2785, 2017.

In this version, we also introduce a new algorithm ML_RQALSH for c-AFN search. It enjoys theoretical guarantee like RQALSH and runs as fast as RQALSH* without parameters tunning.

Compilation

The package requires g++ with c++11 support. To download and compile the code, type:

$ git clone https://github.com/HuangQiang/RQALSH_Mem.git
$ cd RQALSH_Mem
$ make

Datasets

We use four real-life datasets Sift, Gist, Trevi, and P53 for comparison. We randomly remove 1,000 data objects from each dataset and use them as queries. The statistics of datasets and queries are summarized in the following table:

Datasets #Objects #Queries Dimensionality Domain Size Data Size
Sift 1,000,000 1000 128 [0, 218] 337.8 MB
Gist 1,000,000 1000 960 [0, 14,772] 4.0 GB
Trevi 100,900 1000 4,096 [0, 255] 1.5 GB
P53 31,159 1000 5,408 [0, 10,000] 833.7 MB

Run Experiments

Usage: rqalsh [OPTIONS]

This package supports 7 options to evaluate the performance of RQALSH, RQALSH*,
ML_RQALSH, QDAFN, Drusilla_Select, and Linear_Scan for c-AFN search. The parameters
are introduced as follows.

  -alg    integer    options of algorithms (0 - 6)
  -n      integer    cardinality of dataset
  -d      integer    dimensionality of dataset and query set
  -qn     integer    number of queries
  -L      integer    number of projections for RQALSH*, QDAFN*, Drusilla_Select
  -M      integer    number of candidates  for RQALSH*, QDAFN*, Drusilla_Select
  -c      float      approximation ratio for c-AFN search (c > 1)
  -ds     string     address of data  set
  -qs     string     address of query set
  -ts     string     address of truth set
  -op     string     output path

We provide the scripts to run experiments. A quick example is shown as follows (run ML_RQALSH, RQALSH* and RQALSH on Mnist):

# RQALSH
./rqalsh -alg 4 -n 59000 -qn 1000 -d 50 -c 2.0 -ds data/Mnist/Mnist.ds -qs data/Mnist/Mnist.q -ts data/Mnist/Mnist.fn2.0 -op results2.0/Mnist/

# RQALSH*
./rqalsh -alg 5 -n 59000 -qn 1000 -d 50 -L 1000 -M 4 -c 2.0 -ds data/Mnist/Mnist.ds -qs data/Mnist/Mnist.q -ts data/Mnist/Mnist.fn2.0 -op results2.0/Mnist/

# ML_RQALSH
./rqalsh -alg 6 -n 59000 -qn 1000 -d 50 -c 2.0 -ds data/Mnist/Mnist.ds -qs data/Mnist/Mnist.q -ts data/Mnist/Mnist.fn2.0 -op results2.0/Mnist/

If you would like to get more information to run other algorithms, please check the scripts in the package. When you run the package, please ensure that the path for the dataset, query set, and truth set is correct. Since the package will automatically create folder for the output path, please keep the path as short as possible.

Related Publications

If you use this package for publications, please cite the papers as follows.

@inproceedings{huang2017reverse,
    title={Reverse query-aware locality-sensitive hashing for high-dimensional furthest neighbor search}
    author={Huang, Qiang and Feng, Jianlin and Fang, Qiong},
    booktitle={2017 IEEE 33rd International Conference on Data Engineering (ICDE)},
    pages={167--170},
    year={2017},
    organization={IEEE}
}

@article{huang2017two,
    title={Two efficient hashing schemes for high-dimensional furthest neighbor search}
    author={Huang, Qiang and Feng, Jianlin and Fang, Qiong and Ng, Wilfred},
    booktitle={IEEE Transactions on Knowledge and Data Engineering},
    volumn={29},
    number={12},
    pages={2772--2785},
    year={2017},
    organization={IEEE}
}