Autor: Lucas Ribeiro Borges
Neste repositório se encontra o Projeto Final de Programação. Ele implementa o algoritmo e estrutura de dados descrito por Product quantization for nearest neighbor search (Jegou, H. et. al.).
Essa estrutura se trata de um índice invertido que utiliza dois quantizadores para indexar um dataset e permitir busca por vizinho mais próximo aproximado.
Você poderá encontrar o seguinte nos diferentes diretórios:
- configs/: Exemplo de arquivos de configuração.
- docs/: Documentação e especificação.
- scripts/: Script de download dos datasets.
- src/: Código fonte.
- tests/: Testes automatizados.
This repository implements the algorithm and data structure described by Product quantization for nearest neighbor search (Jegou, H. et. al.).
This structure consist of an inverted file index and utilizes two different quantizers to index a dataset and provide efficient approximate nearest neighbor search.
This repository can be used as both a configurable script to evaluate the effect of different parameters and datasets on recall@R performance as well as a library package providing an IVFADC implementation.
You will find the following in the different directories:
- configs/: Example configuration files.
- docs/: Extra documentation and specification.
- scripts/: Dataset download script.
- src/: Source code.
- tests/: Automated tests.
You will need python3
and pipenv
to install this package. Checkout the pipenv page for instructions on installing pipenv
.
Assuming you already have python3
installed, the following steps are recommended:
pip install --user pipenv
export PIPENV_VENV_IN_PROJECT=1
pipenv sync
If you are using WSL2 and pipenv hangs, check the Troubleshooting section.
You can quickly train and populate an IVFADC using provided scripts.
A dataset download script is provided on scripts/download_dataset.sh
, checkout the its README
./scripts/download_dataset.sh SIFT10K
The algorithm parameters can be declared on a .ini
-like file. An example and documentation can be found on configs/
You can run the base script with the following command:
pipenv run python src/main.py configs/siftsmall.ini
The script will report the configurations used for that run and the recall@R for the specified R values.
recall@R is the performance measure: average rate of queries in which the nearest neighbor is ranked within the top R positions.
You can install this package and import it for personal use with:
from ivf_adc.IVFADC import IVFADC
Check the docstrings and docs/
for full documentation.
To run the test suite located in /tests
you will need to install the development dependencies with:
pipenv sync --dev
Once the development dependencies are installed, you can run the entire test suite with:
pipenv run pytest
To run a specific test file you can run:
pipenv run pytest tests/<filename>
To run a specific test file you can run:
pipenv run pytest tests/<filename>::<method_name>
To show stdout
output like print
on passing tests, use the -rP
option:
pipenv run pytest -rP
- If running the project on WSL2, you might need to unset your
DISPLAY
environment variable to properly runpipenv
. You can do so with:
DISPLAY=