DeltaPQ: Lossless Product Quantization Code Compression for High Dimensional Similarity Search
Paper link :
Boost (required components: timer chrono system program_options)
cmake .
make pqtree
make deltapq
The testing dataset can be obtained from
Under the folder of each dataset, rename the following files:
- *_ base.*vecs -> base.*vecs
- *_ learn.*vecs -> learn.*vecs
- *_ query.*vecs -> query.*vecs
For example, after downloading and unzip siftsmall.tar.gz, rename siftsmall_base.fvecs to base.fvecs.
- Learn codewords
./pqtree -dataset [path to the dataset folder]
-ext [file type] // specify the input file type, either fvecs or bvecs
-task learn // learn codewords for PQ
-m [M] // the number of sub-dimensions in PQ
-k [K] // the number of centroids in each sub-dimension in PQ
-train_size [training size] // the number of vectors used for learning codewords
./pqtree -dataset /data/local/pqdata/sift/ -ext fvecs -task learn -m 8 -k 256 - train_size 10000
- Encode
./pqtree -dataset [path to the dataset folder]
-ext [file type] // specify the input file type, either fvecs or bvecs
-task encode // encode PQ codes
-m [M] // the number of sub-dimensions in PQ
-k [K] // the number of centroids in each sub-dimension in PQ
-N [number of vectors] // specify the number of vectors to use
./pqtree -dataset /data/local/pqdata/sift/ -fvecs -task encode -m 8 -k 256
./deltapq -dataset [path to the dataset folder]
-ext [file type] // specify the input file type, either fvecs or bvecs
-task approx_tree // generate approximate DeltaTree
-m [M] // the number of sub-dimensions in PQ
-k [K] // the number of centroids in each sub-dimension in PQ
-h [H] // controls the maximum tree height, usually set as 1
-diff [maximum weight] // the maximum weight of edges in the tree, usually set as the same as m
-N [number of vectors] // specify the number of vectors to use
./deltapq -dataset /data/local/pqdata/sift/ -ext fvecs -task approx_tree -m 8 -k 256 -h 1 -diff 8 -N 1000000
./deltapq -dataset [path to the dataset folder]
-ext [file type] // specify the input file type, either fvecs or bvecs
-task query // perform similarity queries on the deltatree
-m [M] // the number of sub-dimensions in PQ
-k [K] // the number of centroids in each sub-dimension in PQ
-h [H] // controls the maximum tree height, usually set as 1
-diff [maximum weight] // the maximum weight of edges in the tree, usually set as the same as m
-N [number of vectors] // specify the number of vectors to use
-query_size [SIZE] // the number of queries to be performed
-topk [TOPK] // topk
./deltapq -dataset /data/local/pqdata/sift/ -ext fvecs -task query -m 8 -k 256 -h 1 -diff 8 -N 89656 -query_size 10 -topk 10 -debug
Go to the dataset folder and
mkdir groundtruth
Generate Groundtruth
./deltapq -dataset [path to the dataset folder]
-ext [file type] // specify the input file type, either fvecs or bvecs
-task groundtruth // generate groundtruth
-query_size [SIZE] // the number of queries to be performed
-topk [TOPK] // topk
-N [number of vectors] // specify the number of vectors to use
./pqtree -dataset /data/local/pqdata/sift/ -ext fvecs -task groundtruth -topk 10000 -query_size 1000 -N 1000000