/Rayuela.jl

Code for my PhD thesis. Library of quantization-based methods for fast similarity search in high dimensions. Presented at ECCV 18.

Primary LanguageJuliaMIT LicenseMIT

Rayuela.jl

This is the code for the paper

Julieta Martinez, Shobhit Zakhmi, Holger H. Hoos, James J. Little. LSQ++: Lower running time and higher recall in multi-codebook quantization. In ECCV 2018. [CVF].

Rayuela.jl is a package that implements non-orthogonal multi-codebook quantization methods (MCQ). These methods are useful for fast search of high-dimensional (d=100s-1000s) dense vectors. Rayuela is written in Julia.

This is not a production-ready library -- if you are looking for something like that, maybe look at faiss. Do note that the methods implemented by faiss and Rayuela.jl are almost entire orthogonal, and that the libraries are distributed under different licenses as of May 27 2019, FAISS is MIT licensed.

Rayuela implements the main contributions that I made to this problem during my PhD at UBC, as well as multiple baselines for MCQ. The package is my attempt to make my research reproducible and accessible, and to make it easier for other people, specially newcomers, to contribute to this field, where lack of reproducibility is a major barrier of entry IMO.

I originally intended to incorporate these contributions on top of faiss (see faiss/#185), but I soon realized that doing so would considerably delay the completion of my PhD. Julia is also more accessible (albeit a bit less performant) to quickly try and test new research ideas. In the future, savvier C++ programmers may port the most useful methods to faiss.

Authors

The code in this package was written by Julieta Martinez, with help from Joris Clement and Shobhit Zakhmi.

Requirements

This package is written in Julia 1.0, with some extension in C++ and CUDA. You also need a CUDA-ready GPU. We have tested this code on an Nvidia Titan Xp GPU.

Installing

Before all else, make sure that you have the g++ compiler available from the command line, and the nvcc compiler availible at path /usr/local/cuda/bin/nvcc.

Then, open julia and type ] to enter Package mode:

julia>
(v1.0) pkg>

Now you can clone our repo:

(v1.0) pkg> develop https://github.com/una-dinosauria/Rayuela.jl.git

This should put our code under ~/.julia/dev/Rayuela.

Due to an open bug with the package manager, you have to manually pull the latest changes:

cd ~/.julia/dev/Rayuela
git pull

Demo and data

You may explore the library with SIFT1M, a classical retrieval dataset:

mkdir data
cd data
wget ftp://ftp.irisa.fr/local/texmex/corpus/sift.tar.gz
tar -xvzf sift.tar.gz
rm sift.tar.gz
cd ..

Also make a directory for the results

mkdir -p results/sift1m

Finally, run the demo:

julia> include("~/.julia/dev/Rayuela/demos/demos_train_query_base.jl")

For query/base/protocol (example by default runs on SIFT1M), or

julia> include("~/.julia/dev/Rayuela/demos/demos_query_base.jl")

For query/base protocol (example by default runs on LabelMe22K)

This will showcase PQ, OPQ, RVQ, ERVQ, ChainQ and LSQ/LSQ++ (SR-C and SR-D).

The rest of the datasets used in our ECCV'18 publication can be found on gdrive.

Roadmap

Implemented

TODO

Things I'd like to get around implementing / porting / wrapping some day (PRs are welcome!)

TODO (no code, low priority)

I would like to implement these methods. Some of them report really good results but, to the best of my knowledge, the authors have never released code. Also, my time is not infinite so ¯\_(ツ)_/¯

  • Sparse Composite Quantization -- CVPR'15
  • Tree Quantization with Gurobi optimization -- CVPR'15
  • Joint K-means quantization -- ICPR'16 (pay-walled)
  • Pyramid encoding quantization -- EUSIPCO'17
  • Arborescence coding -- ICCV'17

Citing

If you find this code useful, consider citing our work:

Julieta Martinez, Shobhit Zakhmi, Holger H. Hoos, James J. Little. "LSQ++:
Lower running time and higher recall in multi-codebook quantization",
ECCV 2018.

or

Julieta Martinez. "Algorithms for Large-Scale Multi-Codebook Quantization".
PhD thesis, 2018. (Coming soon)

As well as the corresponding paper of the method that you are using (see above).

Notes

1: The original implementation of Tree Quantization requires a mixed-integer programming solver such as Gurobi for updating the codebooks. We implement a special version of TQ that always create a chain (not a general tree); thus encoding can be done with the Viterbi algorithm, and codebook update is simpler and faster. This method should have been a baseline in the TQ paper IMO.

License

MIT