wzzheng/TPVFormer

Higher inference resolution

amundra15 opened this issue · 1 comments

Hi,

Thanks for releasing this amazing work. I am curious about the interpolation capabilities of the method at test time. If I understand it correctly, the training voxel resolution was 100x100x8, but the inference resolution can be anything.

However, when upon increasing the resolution (as below)
tpv_h_ = 200 tpv_w_ = 200 tpv_z_ = 16 scale_h = 1 scale_w = 1 scale_z = 1
I get an error:
RuntimeError: Error(s) in loading state_dict for TPVFormer: size mismatch for tpv_head.tpv_mask_hw: copying a param with shape torch.Size([1, 100, 100]) from checkpoint, the shape in current model is torch.Size([1, 200, 200]). size mismatch for tpv_head.positional_encoding.row_embed.weight: copying a param with shape torch.Size([100, 128]) from checkpoint, the shape in current model is torch.Size([200, 128]). size mismatch for tpv_head.positional_encoding.col_embed.weight: copying a param with shape torch.Size([100, 128]) from checkpoint, the shape in current model is torch.Size([200, 128]). size mismatch for tpv_head.encoder.ref_3d_hw: copying a param with shape torch.Size([1, 4, 10000, 3]) from checkpoint, the shape in current model is torch.Size([1, 4, 40000, 3]). size mismatch for tpv_head.encoder.ref_3d_zh: copying a param with shape torch.Size([1, 32, 800, 3]) from checkpoint, the shape in current model is torch.Size([1, 32, 3200, 3]). size mismatch for tpv_head.encoder.ref_3d_wz: copying a param with shape torch.Size([1, 32, 800, 3]) from checkpoint, the shape in current model is torch.Size([1, 32, 3200, 3]). size mismatch for tpv_head.encoder.ref_2d_hw: copying a param with shape torch.Size([1, 10000, 1, 2]) from checkpoint, the shape in current model is torch.Size([1, 40000, 1, 2]). size mismatch for tpv_head.encoder.ref_2d_zh: copying a param with shape torch.Size([1, 800, 1, 2]) from checkpoint, the shape in current model is torch.Size([1, 3200, 1, 2]). size mismatch for tpv_head.encoder.ref_2d_wz: copying a param with shape torch.Size([1, 800, 1, 2]) from checkpoint, the shape in current model is torch.Size([1, 3200, 1, 2]). size mismatch for tpv_head.tpv_embedding_hw.weight: copying a param with shape torch.Size([10000, 256]) from checkpoint, the shape in current model is torch.Size([40000, 256]). size mismatch for tpv_head.tpv_embedding_zh.weight: copying a param with shape torch.Size([800, 256]) from checkpoint, the shape in current model is torch.Size([3200, 256]). size mismatch for tpv_head.tpv_embedding_wz.weight: copying a param with shape torch.Size([800, 256]) from checkpoint, the shape in current model is torch.Size([3200, 256]).

It seems like some of the modules can not handle the changed output size. Could you guide me on how to achieve variable inference resolution?

I figured out that I need to change the scale_* parameters and not tpv_* ones. Closing the issue.