Automatic Query Image Disambiguation (AID)

This repository contains the reference implementation of AID and code that can be used to reproduce the results from the corresponding paper:

Björn Barz and Joachim Denzler.
"Automatic Query Image Disambiguation for Content-based Image Retrieval."
International Conference on Computer Vision Theory and Applications (VISAPP), 2018.

If you use AID, please cite that paper.

What is AID?

AID is a novel recommendation and re-ranking technique for content-based image retrieval (CBIR). Query images presented to a CBIR system are usually ambiguous, so that user feedback is crucial for refining the search results towards the actual search objective pursued by the user.

Instead of asking the user to mark multiple relevant and irrelevant images among the initial search results one by one, AID automatically discovers different meanings of the query image by clustering the top search results and then asks the users to simply select the cluster that seems most relevant to them. This way, the user's effort is minimized.

Many similar methods restrict the set of refined results to the selected cluster. However, such an approach is sub-optimal, because the final set of results will be a strict subset of the initial set used for clustering, though there could be more relevant images not present among the first top results. AID; in contrast, applies a global re-ranking of all images in the database with respect to both the cluster selected by the user and the similarity to the initial query image.

For details, please refer to the paper mentioned above.

Dependencies

Mandatory

Python >= 3.3
numpy
scipy
scikit-learn

Optional

caffe & pycaffe (required if you want to extract the image features yourself)
tqdm (for progress bars)
matplotlib (if you would like to generate graphs for Precision@k)

Reproducing the results from the paper

Getting the features

Before you can actually run the benchmark of the different query image disambiguation methods, you need to compute some features for the images in the dataset. You can either just download a .npy file with pre-computed features (49 MB) for the MIRFLICKR dataset or you can extract the features yourself as follows:

Download the MIRFLICKR-25K dataset (2.9 GB).
Extract the downloaded file inside of the mirflickr directory of this repository, so that you end up with another mirflickr directory inside of the top-level mirflickr directory.
Download the pre-trained weights of the VGG 16 model (528 MB) and store them in the model directory.
From the root directory of the repository, run: python extract_features.py

Running the benchmark

Once you have downloaded or extracted the features of the dataset images, you can run the benchmark as follows:

python evaluate_query_disambiguation.py --plot_precision

See python evaluate_query_disambiguation.py --help for the full list of options.

The result should be similar to the following:

            |   AP   |  P@1   |  P@10  |  P@50  | P@100  |  NDCG  | NDCG@100
----------------------------------------------------------------------------
Baseline    | 0.4201 | 0.6453 | 0.6191 | 0.5932 | 0.5790 | 0.8693 |   0.5869
CLUE        | 0.4221 | 0.8301 | 0.7829 | 0.6466 | 0.5978 | 0.8722 |   0.6306
Hard-Select | 0.4231 | 0.8138 | 0.8056 | 0.6773 | 0.6116 | 0.8727 |   0.6450
AID         | 0.5188 | 0.8263 | 0.7950 | 0.7454 | 0.7212 | 0.8991 |   0.7351

The baseline results should match exactly, while deviations may occur in the other rows due to randomization.

However, running the benchmark on the entire MIRFLICKR-25K dataset might take about a week and lots of RAM. If you would like to perform a slightly faster consistency check, you can also run the evaluation on a set of 70 pre-defined queries (5 for each topic):

python evaluate_query_disambiguation.py --query_dir mirflickr --rounds 10 --show_sd --plot_precision

In that case, the results should be similar to:

            |   AP   |  P@1   |  P@10  |  P@50  | P@100  |  NDCG  | NDCG@100
----------------------------------------------------------------------------
Baseline    | 0.3753 | 0.7286 | 0.6800 | 0.6100 | 0.5664 | 0.8223 |   0.5880
CLUE        | 0.3810 | 0.9100 | 0.8133 | 0.6462 | 0.5816 | 0.8290 |   0.6232
Hard-Select | 0.3849 | 0.8457 | 0.8469 | 0.6846 | 0.6011 | 0.8314 |   0.6426
AID         | 0.4625 | 0.8757 | 0.8206 | 0.7211 | 0.6711 | 0.8531 |   0.6991


Standard Deviation:

            |   AP   |  P@1   |  P@10  |  P@50  | P@100  |  NDCG  | NDCG@100
----------------------------------------------------------------------------
Baseline    | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |   0.0000
CLUE        | 0.0005 | 0.0239 | 0.0074 | 0.0045 | 0.0033 | 0.0003 |   0.0037
Hard-Select | 0.0006 | 0.0270 | 0.0068 | 0.0072 | 0.0031 | 0.0005 |   0.0039
AID         | 0.0053 | 0.0203 | 0.0087 | 0.0085 | 0.0088 | 0.0017 |   0.0075