WangYixuan12/d3fields

Question about the shape of sematic feature map

Gloryseven opened this issue · 1 comments

hello! The size of dinov2 feature image is 'patch_h, patch_w', but the size of mask image is 'H, W'. They are written the same in the interpolation section of the paper. (both 'H ,W'). How is it handled in the code?

During the interpolation, a 3D point will be projected into 2D image space and normalized to 0~1. Therefore, it does not matter if H does not equal to patch_h. More details can be seen in https://pytorch.org/docs/stable/generated/torch.nn.functional.grid_sample.html