This package provides two internal LSH schemes RQALSH and RQALSH* for high-dimensional c-Approximate Furthest Neighbor (c-AFN)
search from the following two papers:
Qiang Huang, Jianlin Feng, Qiong Fang. Reverse Query-Aware Locality-Sensitive
Hashing for High-Dimensional Furthest Neighbor Search. 2017 IEEE 33rd International
Conference on Data Engineering (ICDE), pages 167-170, 2017.
Qiang Huang, Jianlin Feng, Qiong Fang, Wilfred Ng. Two Efficient Hashing Schemes for
High-Dimensional Furthest Neighbor Search. IEEE Transactions on Knowledge and Data
Engineering (TKDE), 29(12): 2772–2785, 2017.
In this version, we also introduce a new algorithm ML_RQALSH
for c-AFN search. It enjoys theoretical guarantee like RQALSH and runs as fast as RQALSH* without parameters tunning.
The package requires g++
with c++11
support. To download and compile the code, type:
$ git clone https://github.com/HuangQiang/RQALSH_Mem.git
$ cd RQALSH_Mem
$ make
We use four real-life datasets Sift, Gist, Trevi, and P53 for comparison. We randomly remove 1,000 data objects from each dataset and use them as queries. The statistics of datasets and queries are summarized in the following table:
Datasets | #Objects | #Queries | Dimensionality | Domain Size | Data Size |
---|---|---|---|---|---|
Sift | 1,000,000 | 1000 | 128 | [0, 218] | 337.8 MB |
Gist | 1,000,000 | 1000 | 960 | [0, 14,772] | 4.0 GB |
Trevi | 100,900 | 1000 | 4,096 | [0, 255] | 1.5 GB |
P53 | 31,159 | 1000 | 5,408 | [0, 10,000] | 833.7 MB |
Usage: rqalsh [OPTIONS]
This package supports 7 options to evaluate the performance of RQALSH, RQALSH*,
ML_RQALSH, QDAFN, Drusilla_Select, and Linear_Scan for c-AFN search. The parameters
are introduced as follows.
-alg integer options of algorithms (0 - 6)
-n integer cardinality of dataset
-d integer dimensionality of dataset and query set
-qn integer number of queries
-L integer number of projections for RQALSH*, QDAFN*, Drusilla_Select
-M integer number of candidates for RQALSH*, QDAFN*, Drusilla_Select
-c float approximation ratio for c-AFN search (c > 1)
-ds string address of data set
-qs string address of query set
-ts string address of truth set
-op string output path
We provide the scripts to run experiments. A quick example is shown as follows (run ML_RQALSH, RQALSH* and RQALSH on Mnist
):
# RQALSH
./rqalsh -alg 4 -n 59000 -qn 1000 -d 50 -c 2.0 -ds data/Mnist/Mnist.ds -qs data/Mnist/Mnist.q -ts data/Mnist/Mnist.fn2.0 -op results2.0/Mnist/
# RQALSH*
./rqalsh -alg 5 -n 59000 -qn 1000 -d 50 -L 1000 -M 4 -c 2.0 -ds data/Mnist/Mnist.ds -qs data/Mnist/Mnist.q -ts data/Mnist/Mnist.fn2.0 -op results2.0/Mnist/
# ML_RQALSH
./rqalsh -alg 6 -n 59000 -qn 1000 -d 50 -c 2.0 -ds data/Mnist/Mnist.ds -qs data/Mnist/Mnist.q -ts data/Mnist/Mnist.fn2.0 -op results2.0/Mnist/
If you would like to get more information to run other algorithms, please check the scripts in the package. When you run the package, please ensure that the path for the dataset, query set, and truth set is correct. Since the package will automatically create folder for the output path, please keep the path as short as possible.
If you use this package for publications, please cite the papers as follows.
@inproceedings{huang2017reverse,
title={Reverse query-aware locality-sensitive hashing for high-dimensional furthest neighbor search}
author={Huang, Qiang and Feng, Jianlin and Fang, Qiong},
booktitle={2017 IEEE 33rd International Conference on Data Engineering (ICDE)},
pages={167--170},
year={2017},
organization={IEEE}
}
@article{huang2017two,
title={Two efficient hashing schemes for high-dimensional furthest neighbor search}
author={Huang, Qiang and Feng, Jianlin and Fang, Qiong and Ng, Wilfred},
booktitle={IEEE Transactions on Knowledge and Data Engineering},
volumn={29},
number={12},
pages={2772--2785},
year={2017},
organization={IEEE}
}