/NNSearch

NNSearch: A Unified NN Search Framework

Primary LanguageC++Apache License 2.0Apache-2.0

NNSearch: A Unified NN Search Framework

Due to restrictions imposed by the company's agreement, we are unable to release the internal code for building indexes and conducting the NN search. However, we now release a unified NN search framework based on the hnswlib. One advantage is that it allows for a fairer comparison of search performance with hnswlib, even if the latter employs a hierarchical indexing structure.

Features

framework

  • Support various indexing graphs (TSDG$^+$, DPG, NSG, SSG, Vamana, and HNSW), while only one universal API.
    • Automatically select the load function based on the suffix of the index graph.
    • .ivecs for k-NN graph, shortcut graph, TSDG$^+$, and DPG
    • .nsg for NSG
    • .ssg for SSG
    • .hnsw for HNSW
    • Vamana's indexing graph has no suffix.
  • Fast NN search performance and has a fair comparison with hnswlib.
    • no change to hnsw indexing and NN search.
    • our NN search = hnsw layer0's NN search + random seeds.

Compile

  • Please make sure Boost is installed. On ubuntu, it can be installed by sudo apt install libboost-all-dev.
cd hnswlib
mkdir build && cd build 
cmake .. && make -j
cd ../..

Pre-built Indexing Graphs Are Available

  • Google Drive
  • Only SIFT1M and DEEP1M have indexing graphs for all comparison methods, while other datasets only have index graphs for TSDG$^+$.

Usage

  1. Run NN search performance curve
  • This will load the indexing graph to run NN search and save records.

warning: NN search works on single-thread. It may take several minutes.

./hnswlib/build/uni_nnsearch \
--base_path=/home/jfwang/data/sift1m/sift1m_base.fvecs \
--query_path=/home/jfwang/data/sift1m/sift1m_query.fvecs \
--gt_path=/home/jfwang/data/sift1m/sift1m_gt.ivecs \
--index_path=/home/jfwang/data/sift1m/sift1m_TSDGed_K200_SavMem_XNDesc_SCG_1.2_4_50.ivecs \
>> ./nns_records/sift1m_TSDG_plus.record
  1. visualization
  • after all records are put in the ./nns_records folder. we can call draw_fig.py to show curves.
python draw_fig.py

Indexing Parameters Are Available

  • We release the parameters we used to index.
  • LSH-APG's parameters are loyal to the original implementations except the searched candidate pool size, which is set as 512. Since it cannot be integrated into this NN search framwork, its parameters are not listed here, and can be found in its literature. sift1m_para deep1m_para gist1m_para turing1m_para spacev1m_para t2i1m_para sift100m_para deep100m_para turing100m_para spacev100m_para

Note

  1. Only float datatype is supported as hnswlib does.
  2. Only the search performance curves in our paper's Fig 6 - 8 can be made. We have an un-released version code, which optimizes the memory cost during NN search.
  3. For full API, plz run ./hnswlib/build/uni_nnsearch -h
  4. Our API source codes are listed at hnswlib/examples/cpp/uni_nnsearch.cpp and hnswlib/hnswlib/dataloader.h. Have fun with it.