The Source Code for LSH-APG (PVLDB 2023)

Introduction

This is a source code for the algorithm described in the paper Towards Efficient Index Construction and Approximate Nearest Neighbor Search in High-Dimensional Spaces (Submitted to PVLDB 2023). We call it as LG project.

Compilation

LG project is written by C++ and can be complied by g++ in Linux and MSVC in Windows. It adopt openMP for parallelism.

Installation

Windows

We can use Visual Studio 2019 to build the project with importing all the files in the directory ./cppCode/LSH-APG/src/.

Linux

cd ./cppCode/LSH-APG
make

The excutable file is then in dbLSH directory, called as lgo

Usage

Command Usage

lgo datasetName

(the first parameter specifies the procedure be executed and change)

Parameter explanation

datasetName : dataset name

FOR EXAMPLE, YOU CAN RUN THE FOLLOWING CODE IN COMMAND LINE AFTER BUILD ALL THE TOOLS:

cd ./cppCode/LSH-APG
./lgo audio

Dataset

In our project, the format of the input file (such as audio.data_new, which is in float data type) is the same as that in LSHBOX. It is a binary file, which is organized as the following format:

{Bytes of the data type (int)} {The size of the vectors (int)} {The dimension of the vectors (int)} {All of the binary vector, arranged in turn (float)}

For your application, you should also transform your dataset into this binary format, then rename it as [datasetName].data_new and put it in the directory ./dataset.

A sample dataset audio.data_new has been put in the directory ./dataset. Also, you can get it, audio.data, from here(if so, rename it as audio.data_new). If the link is invalid, you can also get it from data.

For the datasets we use, you can get the raw data from following links: MNIST, Deep1M, GIST, TinyImages80M, SIFT. Next, you should transform your raw dataset into the mentioned binary format, then rename it is [datasetName].data_new and put it in the directory ./dataset.