Question about the shape of sematic feature map

Question

Question about the shape of sematic feature map

Gloryseven opened this issue 10 months ago · 1 comments

hello! The size of dinov2 feature image is 'patch_h, patch_w', but the size of mask image is 'H, W'. They are written the same in the interpolation section of the paper. (both 'H ,W'). How is it handled in the code?

Answer 1 · 2024-01-16T18:04:17.000Z

During the interpolation, a 3D point will be projected into 2D image space and normalized to 0~1. Therefore, it does not matter if H does not equal to patch_h. More details can be seen in https://pytorch.org/docs/stable/generated/torch.nn.functional.grid_sample.html