This package provides two external LSH schemes RQALSH and RQALSH* for high-dimensional c-Approximate Furthest Neighbor Search.
The package requires g++
with c++11
support. To download and compile the code, type:
git clone git@github.com/HuangQiang/RQALSH.git
cd RQALSH
make -j
We use four real-life datasets Sift, Gist, Trevi, and P53 for comparison. We randomly remove 1,000 data objects from each dataset and use them as queries. The statistics of datasets and queries are summarized in the following table:
Datasets | #Objects | #Queries | Dimensionality | Page Size | Domain Size | Data Size |
---|---|---|---|---|---|---|
Sift | 1,000,000 | 1000 | 128 | 4 KB | [0, 218] | 337.8 MB |
Gist | 1,000,000 | 1000 | 960 | 16 KB | [0, 14,772] | 4.0 GB |
Trevi | 100,900 | 1000 | 4,096 | 64 KB | [0, 255] | 1.5 GB |
P53 | 31,159 | 1000 | 5,408 | 64 KB | [0, 10,000] | 833.7 MB |
Usage: qalsh [OPTIONS]
This package supports 10 options to evaluate the performance of RQALSH, RQALSH*,
QDAFN, QDAFN*, Drusilla_Select, and Linear_Scan for c-AFN search. The parameters
are introduced as follows.
-alg integer options of algorithms (0 - 9)
-n integer cardinality of dataset
-d integer dimensionality of dataset and query set
-qn integer number of queries
-B integer page size
-L integer number of projections for RQALSH*, QDAFN*, Drusilla_Select
-M integer number of candidates for RQALSH*, QDAFN*, Drusilla_Select
-beta integer number of false positives
-delta float error probability
-c float approximation ratio for c-AFN search (c > 1)
-ds string address of data set
-qs string address of query set
-ts string address of truth set
-df string data folder to store new format of data
-of string output folder to store output results
We provide the scripts to repeat experiments reported in TKDE 2017. A quick example is shown as follows (run RQALSH* and RQALSH on Mnist
):
# RQALSH*
./rqalsh -alg 1 -n 59000 -d 50 -B 4096 -L 1000 -M 3 -beta 100 -delta 0.49 -c 2.0 -ds data/Mnist/Mnist.ds -df data/Mnist/ -of results/rqalsh_star/
./rqalsh -alg 2 -qn 1000 -d 50 -qs data/Mnist/Mnist.q -ts data/Mnist/Mnist.fn2.0 -df data/Mnist/ -of results/rqalsh_star/
# RQALSH
./rqalsh -alg 3 -n 59000 -d 50 -B 4096 -beta 100 -delta 0.49 -c 2.0 -ds data/Mnist/Mnist.ds -df data/Mnist/ -of results/rqalsh/
./rqalsh -alg 4 -qn 1000 -d 50 -qs data/Mnist/Mnist.q -ts data/Mnist/Mnist.fn2.0 -df data/Mnist/ -of results/rqalsh/
If you would like to get more information to run other algorithms, please check the scripts in the package. When you run the package, please ensure that the path for the dataset, query set, and truth set is correct. Since the package will automatically create folder for the output path, please keep the path as short as possible.
If you use this package for publications, please cite the papers as follows.
@inproceedings{huang2017reverse,
title={Reverse query-aware locality-sensitive hashing for high-dimensional furthest neighbor search}
author={Huang, Qiang and Feng, Jianlin and Fang, Qiong},
booktitle={2017 IEEE 33rd International Conference on Data Engineering (ICDE)},
pages={167--170},
year={2017},
organization={IEEE}
}
@article{huang2017two,
title={Two efficient hashing schemes for high-dimensional furthest neighbor search}
author={Huang, Qiang and Feng, Jianlin and Fang, Qiong and Ng, Wilfred},
booktitle={IEEE Transactions on Knowledge and Data Engineering},
volumn={29},
number={12},
pages={2772--2785},
year={2017},
organization={IEEE}
}