rmislam/PythonSIFT

Two questions about keypoint scale

Doom9234 opened this issue · 3 comments

HI, thank you for the great work, it helps me a lot when learning SIFT, I have to 2 questions about keypoint.size

  1. When computing keypoint.size in localizeExtremumViaQuadraticFit(), you used octave_index + 1 instead of octave_index, is this because the origin input image is doubled by linear interpolation?

  2. In computeKeypointsWithOrientations, the scale is muliplied by 0.5 and divided by 2 ** octave_index, it's a little different from the origin paper, am I missing something?

HI, thank you for the great work, it helps me a lot when learning SIFT, I have to 2 questions about keypoint.size

1. When computing keypoint.size in localizeExtremumViaQuadraticFit(), you used octave_index + 1 instead of octave_index, is this because the origin input image is doubled by linear interpolation?

2. In computeKeypointsWithOrientations, the scale is muliplied by 0.5 and divided by 2 ** octave_index, it's a little different from the origin paper, am I missing something?

I think the scale to compute the orientation should be 2 ** ((image_index + extremum_update[2]) / float32(num_intervals))) * (2 ** (octave_index ))
it's different from the scale in computeKeypointsWithOrientations()

Hi there! Thanks for looking into PythonSIFT so deeply.

Yes, I'm using octave_index + 1 instead of octave_index because we double the input image size. Later in convertKeypointsToInputImageSize(), we do keypoint.size *= 0.5 to account for this.

Actually, the way scale is computed in computeKeypointsWithOrientations() is consistent with the way keypoints.size is computed in localizeExtremumViaQuadraticFit(). Let's compare line 181:

keypoint.size = sigma * (2 ** ((image_index + extremum_update[2]) / float32(num_intervals))) * (2 ** (octave_index + 1))

with line 227:

scale = scale_factor * 0.5 * keypoint.size / float32(2 ** octave_index)

Note that we can rewrite line 227 like this, and it's mathematically equivalent:

scale = scale_factor * keypoint.size / float32(2 ** (octave_index + 1)).

In fact, I made this change in the latest commit in order to be consistent with line 181. Sorry for the confusion. However, there's nothing wrong with the original line 227. We are simply reversing the * (2 ** (octave_index + 1))operation done in line 118 to recover the scale. We then multiply by scale_factor, as mentioned in the paper.

Note that size here corresponds to the same size variable used in OpenCV's SIFT implementation, which you can find here:

https://github.com/opencv/opencv_contrib/blob/master/modules/xfeatures2d/src/sift.cpp

If you take a look at lines 572 and 661, you can see they compute size and scale the same way. You can even set up breakpoints in OpenCV's sift.cpp and PythonSIFT's pysift.py, and compare the keypoints at each step of the computation and verify they are the same within rounding error. The size variable isn't mentioned in the original SIFT paper -- it's just a convenient way to store the scale across different octaves and layers using a single number.

I hope this helps! Does this answer your questions?

I believe this issue has been resolved, so I'm going to close it. Please feel free to reopen this issue if you feel your question hasn't been answered.