tsattler/geometric_burstiness

The computation of self-similarity of database image

Closed this issue · 4 comments

In function DetermineDBImageSelfSimilarities() of file inverted_file.h, the self-similarity of database images is computed to use as the normalization factor for final similarity score between query and database images.

However, it seems that the computation of self-similarity is simply to accumulate the square of idf weight of each visual word. The detail code is given as follow.

    for (int i = 0; i < num_entries; ++i) {
      current_image_id = entries_[i].image_id;
      score_ref[current_image_id] += idf_squared;
    }

However, the correlation of same visual word in a database image is ignored, which violate the original definition of self-similarity that the similarity score between the same image should be 1, as illustrated in ACCV 2014 Disloc paper.
On the contrast, we modify the computation of self-similarity as follow.

       int num_score = score_ref.size();
       std::vector<double> num_vw(num_score,0.0);
       for (int i = 0; i < num_entries; ++i) {
         current_image_id = entries_[i].image_id;
         num_vw[current_image_id] += 1;
       }       
       for (int i = 0; i < num_score; ++i) {         
         score_ref[i] += num_vw[i]*num_vw[i]*idf_squared;
       } 

I have found the modification can improve the recall when I conduct the experiment on Pittsburgh 250k dataset. The recall@1 is improved from 0.508 to 0.527 without spatial verification step.
I 'm not sure whether this is a bug or the original computation in the public code is the right definition of self-similarity.


1, Could you tell me the detail recall, such as recall@1, after initial retrieval without spatial verification in your implementation?
I have evaluated two different feature extraction approaches, the one from heasff (https://github.com/perdoch/hesaff), and another one from VGG with low cornerness threshold (http://www.robots.ox.ac.uk/~vgg/research/affine/detectors.html#binaries).
The latter one with threshold 100 will extract about 260M local features, and the former can extract about 218M locat features.
However the recall of both feature extraction step is hard to improve to the result reported in ACCV2014 and your paper on both place recognition datasets.

2, Could you tell me how you transfer the jpg image to ppm image? I use the jpegtopnm command in Linux.

3, If you have time, could you give the result of feature extraction of the first image in Pittsburgh dataset, i.e., the imgname.hesaff file? I want to verify if I have extracted enough features.

Thanks!

You are right about the self-similarity. When we did the paper, we noticed that we were not able to beat the original DisLoc implementation (we did not have the implementation, but the results) with spatial re-ranking (see Fig. 2 in the paper). This might be one of the reasons why our implementation was worse without taking geometric burstiness into account.
Could you maybe create a pull request with your self-similarity computation? I will have a look at it as soon as I have some time.

Regarding your questions:

  1. I can't remember that we ever looked at the performance of the system without spatial verification. The goal of the project was to improve the spatial verification performance by taking repeating patterns into account. In addition, we were more interested in the recall @ precision N as to be able to distinguish between correctly and incorrectly retrieved images. Compared to what we used in the paper, we did not change the retrieval part of the pipeline (a few bugfixes aside). I had to rewrite the geometric burstiness part though.
    Looking at the original DisLoc paper, it seems they achieve a recall@1 of around 57% on Pittsburgh.

  2. If I remember correctly, we used convert / mogrify.

  3. Could you give me the name of one of the files. Then I can see whether I still have the extracted features.

Thanks for your reply!
I will create a pull request with the update self-similarity computation as soon as possible.

The name of the first database image of the Pittsburgh250k dataset is 000/000000_patch1_yaw1.jpg.

I can't find the feature files for the database images from Pittsburgh anymore. I probably had to delete them as I was running out of memory (I still have the database itself, which also contains all features). I also do not have the .hesaff files anymore, only the binary files I created out of them. You can find the .bin file for the first query image (000/000015_pitch1_yaw1.bin) here. If I remember correctly, the file should also contain the IDs of the 5 nearest neighboring visual words.
Unfortunately, I don't remember which feature detector was used to extract the descriptors anymore (if I remember correctly, this was done by the second author of the paper, who since then has left academia).

OK!
Thanks for your help!