NVlabs/FasterViT

Size mismatch when loading pretrained weights.

TsingWei opened this issue · 1 comments

After install fastervit(0.9.8) from pip, I tired to run the example test code:

from fastervit import create_model
    model = create_model('faster_vit_0_any_res', 
                          resolution=[576, 960],
                          window_size=[7, 7, 12, 6],
                          ct_size=2,
                          dim=64,
                          pretrained=True)

However , the error occurs that the size mismatch for some params:

The model and loaded state dict do not match exactly

size mismatch for levels.2.blocks.0.pos_embed.relative_bias: copying a param with shape torch.Size([1, 49, 256]) from checkpoint, the shape in current model is torch.Size([1, 144, 256]).
size mismatch for levels.2.blocks.0.attn.pos_emb_funct.relative_coords_table: copying a param with shape torch.Size([1, 13, 13, 2]) from checkpoint, the shape in current model is torch.Size([1, 23, 23, 2]).
size mismatch for levels.2.blocks.0.attn.pos_emb_funct.relative_position_index: copying a param with shape torch.Size([49, 49]) from checkpoint, the shape in current model is torch.Size([144, 144]).
size mismatch for levels.2.blocks.0.attn.pos_emb_funct.relative_bias: copying a param with shape torch.Size([1, 8, 53, 53]) from checkpoint, the shape in current model is torch.Size([1, 8, 148, 148]).
size mismatch for levels.2.blocks.0.hat_attn.pos_emb_funct.relative_coords_table: copying a param with shape torch.Size([1, 7, 7, 2]) from checkpoint, the shape in current model is torch.Size([1, 13, 13, 2]).
size mismatch for levels.2.blocks.0.hat_attn.pos_emb_funct.relative_position_index: copying a param with shape torch.Size([16, 16]) from checkpoint, the shape in current model is torch.Size([49, 49]).
size mismatch for levels.2.blocks.0.hat_attn.pos_emb_funct.relative_bias: copying a param with shape torch.Size([1, 8, 16, 16]) from checkpoint, the shape in current model is torch.Size([1, 8, 60, 60]).
size mismatch for levels.2.blocks.1.pos_embed.relative_bias: copying a param with shape torch.Size([1, 49, 256]) from checkpoint, the shape in current model is torch.Size([1, 144, 256]).
size mismatch for levels.2.blocks.1.attn.pos_emb_funct.relative_coords_table: copying a param with shape torch.Size([1, 13, 13, 2]) from checkpoint, the shape in current model is torch.Size([1, 23, 23, 2]).
size mismatch for levels.2.blocks.1.attn.pos_emb_funct.relative_position_index: copying a param with shape torch.Size([49, 49]) from checkpoint, the shape in current model is torch.Size([144, 144]).
size mismatch for levels.2.blocks.1.attn.pos_emb_funct.relative_bias: copying a param with shape torch.Size([1, 8, 53, 53]) from checkpoint, the shape in current model is torch.Size([1, 8, 148, 148]).
size mismatch for levels.2.blocks.1.hat_attn.pos_emb_funct.relative_coords_table: copying a param with shape torch.Size([1, 7, 7, 2]) from checkpoint, the shape in current model is torch.Size([1, 13, 13, 2]).
size mismatch for levels.2.blocks.1.hat_attn.pos_emb_funct.relative_position_index: copying a param with shape torch.Size([16, 16]) from checkpoint, the shape in current model is torch.Size([49, 49]).
size mismatch for levels.2.blocks.1.hat_attn.pos_emb_funct.relative_bias: copying a param with shape torch.Size([1, 8, 16, 16]) from checkpoint, the shape in current model is torch.Size([1, 8, 60, 60]).
size mismatch for levels.2.blocks.2.pos_embed.relative_bias: copying a param with shape torch.Size([1, 49, 256]) from checkpoint, the shape in current model is torch.Size([1, 144, 256]).
size mismatch for levels.2.blocks.2.attn.pos_emb_funct.relative_coords_table: copying a param with shape torch.Size([1, 13, 13, 2]) from checkpoint, the shape in current model is torch.Size([1, 23, 23, 2]).
size mismatch for levels.2.blocks.2.attn.pos_emb_funct.relative_position_index: copying a param with shape torch.Size([49, 49]) from checkpoint, the shape in current model is torch.Size([144, 144]).
size mismatch for levels.2.blocks.2.attn.pos_emb_funct.relative_bias: copying a param with shape torch.Size([1, 8, 53, 53]) from checkpoint, the shape in current model is torch.Size([1, 8, 148, 148]).
size mismatch for levels.2.blocks.2.hat_attn.pos_emb_funct.relative_coords_table: copying a param with shape torch.Size([1, 7, 7, 2]) from checkpoint, the shape in current model is torch.Size([1, 13, 13, 2]).
size mismatch for levels.2.blocks.2.hat_attn.pos_emb_funct.relative_position_index: copying a param with shape torch.Size([16, 16]) from checkpoint, the shape in current model is torch.Size([49, 49]).
size mismatch for levels.2.blocks.2.hat_attn.pos_emb_funct.relative_bias: copying a param with shape torch.Size([1, 8, 16, 16]) from checkpoint, the shape in current model is torch.Size([1, 8, 60, 60]).
size mismatch for levels.2.blocks.3.pos_embed.relative_bias: copying a param with shape torch.Size([1, 49, 256]) from checkpoint, the shape in current model is torch.Size([1, 144, 256]).
size mismatch for levels.2.blocks.3.attn.pos_emb_funct.relative_coords_table: copying a param with shape torch.Size([1, 13, 13, 2]) from checkpoint, the shape in current model is torch.Size([1, 23, 23, 2]).
size mismatch for levels.2.blocks.3.attn.pos_emb_funct.relative_position_index: copying a param with shape torch.Size([49, 49]) from checkpoint, the shape in current model is torch.Size([144, 144]).
size mismatch for levels.2.blocks.3.attn.pos_emb_funct.relative_bias: copying a param with shape torch.Size([1, 8, 53, 53]) from checkpoint, the shape in current model is torch.Size([1, 8, 148, 148]).
size mismatch for levels.2.blocks.3.hat_attn.pos_emb_funct.relative_coords_table: copying a param with shape torch.Size([1, 7, 7, 2]) from checkpoint, the shape in current model is torch.Size([1, 13, 13, 2]).
size mismatch for levels.2.blocks.3.hat_attn.pos_emb_funct.relative_position_index: copying a param with shape torch.Size([16, 16]) from checkpoint, the shape in current model is torch.Size([49, 49]).
size mismatch for levels.2.blocks.3.hat_attn.pos_emb_funct.relative_bias: copying a param with shape torch.Size([1, 8, 16, 16]) from checkpoint, the shape in current model is torch.Size([1, 8, 60, 60]).
size mismatch for levels.2.blocks.4.pos_embed.relative_bias: copying a param with shape torch.Size([1, 49, 256]) from checkpoint, the shape in current model is torch.Size([1, 144, 256]).
size mismatch for levels.2.blocks.4.attn.pos_emb_funct.relative_coords_table: copying a param with shape torch.Size([1, 13, 13, 2]) from checkpoint, the shape in current model is torch.Size([1, 23, 23, 2]).
size mismatch for levels.2.blocks.4.attn.pos_emb_funct.relative_position_index: copying a param with shape torch.Size([49, 49]) from checkpoint, the shape in current model is torch.Size([144, 144]).
size mismatch for levels.2.blocks.4.attn.pos_emb_funct.relative_bias: copying a param with shape torch.Size([1, 8, 53, 53]) from checkpoint, the shape in current model is torch.Size([1, 8, 148, 148]).
size mismatch for levels.2.blocks.4.hat_attn.pos_emb_funct.relative_coords_table: copying a param with shape torch.Size([1, 7, 7, 2]) from checkpoint, the shape in current model is torch.Size([1, 13, 13, 2]).
size mismatch for levels.2.blocks.4.hat_attn.pos_emb_funct.relative_position_index: copying a param with shape torch.Size([16, 16]) from checkpoint, the shape in current model is torch.Size([49, 49]).
size mismatch for levels.2.blocks.4.hat_attn.pos_emb_funct.relative_bias: copying a param with shape torch.Size([1, 8, 16, 16]) from checkpoint, the shape in current model is torch.Size([1, 8, 60, 60]).
size mismatch for levels.2.blocks.5.pos_embed.relative_bias: copying a param with shape torch.Size([1, 49, 256]) from checkpoint, the shape in current model is torch.Size([1, 144, 256]).
size mismatch for levels.2.blocks.5.attn.pos_emb_funct.relative_coords_table: copying a param with shape torch.Size([1, 13, 13, 2]) from checkpoint, the shape in current model is torch.Size([1, 23, 23, 2]).
size mismatch for levels.2.blocks.5.attn.pos_emb_funct.relative_position_index: copying a param with shape torch.Size([49, 49]) from checkpoint, the shape in current model is torch.Size([144, 144]).
size mismatch for levels.2.blocks.5.attn.pos_emb_funct.relative_bias: copying a param with shape torch.Size([1, 8, 53, 53]) from checkpoint, the shape in current model is torch.Size([1, 8, 148, 148]).
size mismatch for levels.2.blocks.5.hat_attn.pos_emb_funct.relative_coords_table: copying a param with shape torch.Size([1, 7, 7, 2]) from checkpoint, the shape in current model is torch.Size([1, 13, 13, 2]).
size mismatch for levels.2.blocks.5.hat_attn.pos_emb_funct.relative_position_index: copying a param with shape torch.Size([16, 16]) from checkpoint, the shape in current model is torch.Size([49, 49]).
size mismatch for levels.2.blocks.5.hat_attn.pos_emb_funct.relative_bias: copying a param with shape torch.Size([1, 8, 16, 16]) from checkpoint, the shape in current model is torch.Size([1, 8, 60, 60]).
size mismatch for levels.3.blocks.0.pos_embed.relative_bias: copying a param with shape torch.Size([1, 49, 512]) from checkpoint, the shape in current model is torch.Size([1, 36, 512]).
size mismatch for levels.3.blocks.0.attn.pos_emb_funct.relative_coords_table: copying a param with shape torch.Size([1, 13, 13, 2]) from checkpoint, the shape in current model is torch.Size([1, 11, 11, 2]).
size mismatch for levels.3.blocks.0.attn.pos_emb_funct.relative_position_index: copying a param with shape torch.Size([49, 49]) from checkpoint, the shape in current model is torch.Size([36, 36]).
size mismatch for levels.3.blocks.0.attn.pos_emb_funct.relative_bias: copying a param with shape torch.Size([1, 16, 49, 49]) from checkpoint, the shape in current model is torch.Size([1, 16, 36, 36]).
size mismatch for levels.3.blocks.1.pos_embed.relative_bias: copying a param with shape torch.Size([1, 49, 512]) from checkpoint, the shape in current model is torch.Size([1, 36, 512]).
size mismatch for levels.3.blocks.1.attn.pos_emb_funct.relative_coords_table: copying a param with shape torch.Size([1, 13, 13, 2]) from checkpoint, the shape in current model is torch.Size([1, 11, 11, 2]).
size mismatch for levels.3.blocks.1.attn.pos_emb_funct.relative_position_index: copying a param with shape torch.Size([49, 49]) from checkpoint, the shape in current model is torch.Size([36, 36]).
size mismatch for levels.3.blocks.1.attn.pos_emb_funct.relative_bias: copying a param with shape torch.Size([1, 16, 49, 49]) from checkpoint, the shape in current model is torch.Size([1, 16, 36, 36]).
size mismatch for levels.3.blocks.2.pos_embed.relative_bias: copying a param with shape torch.Size([1, 49, 512]) from checkpoint, the shape in current model is torch.Size([1, 36, 512]).
size mismatch for levels.3.blocks.2.attn.pos_emb_funct.relative_coords_table: copying a param with shape torch.Size([1, 13, 13, 2]) from checkpoint, the shape in current model is torch.Size([1, 11, 11, 2]).
size mismatch for levels.3.blocks.2.attn.pos_emb_funct.relative_position_index: copying a param with shape torch.Size([49, 49]) from checkpoint, the shape in current model is torch.Size([36, 36]).
size mismatch for levels.3.blocks.2.attn.pos_emb_funct.relative_bias: copying a param with shape torch.Size([1, 16, 49, 49]) from checkpoint, the shape in current model is torch.Size([1, 16, 36, 36]).
size mismatch for levels.3.blocks.3.pos_embed.relative_bias: copying a param with shape torch.Size([1, 49, 512]) from checkpoint, the shape in current model is torch.Size([1, 36, 512]).
size mismatch for levels.3.blocks.3.attn.pos_emb_funct.relative_coords_table: copying a param with shape torch.Size([1, 13, 13, 2]) from checkpoint, the shape in current model is torch.Size([1, 11, 11, 2]).
size mismatch for levels.3.blocks.3.attn.pos_emb_funct.relative_position_index: copying a param with shape torch.Size([49, 49]) from checkpoint, the shape in current model is torch.Size([36, 36]).
size mismatch for levels.3.blocks.3.attn.pos_emb_funct.relative_bias: copying a param with shape torch.Size([1, 16, 49, 49]) from checkpoint, the shape in current model is torch.Size([1, 16, 36, 36]).
size mismatch for levels.3.blocks.4.pos_embed.relative_bias: copying a param with shape torch.Size([1, 49, 512]) from checkpoint, the shape in current model is torch.Size([1, 36, 512]).
size mismatch for levels.3.blocks.4.attn.pos_emb_funct.relative_coords_table: copying a param with shape torch.Size([1, 13, 13, 2]) from checkpoint, the shape in current model is torch.Size([1, 11, 11, 2]).
size mismatch for levels.3.blocks.4.attn.pos_emb_funct.relative_position_index: copying a param with shape torch.Size([49, 49]) from checkpoint, the shape in current model is torch.Size([36, 36]).
size mismatch for levels.3.blocks.4.attn.pos_emb_funct.relative_bias: copying a param with shape torch.Size([1, 16, 49, 49]) from checkpoint, the shape in current model is torch.Size([1, 16, 36, 36]).
unexpected key in source state_dict: levels.2.blocks.0.hat_pos_embed.relative_bias, levels.2.blocks.0.hat_pos_embed.cpb_mlp.0.weight, levels.2.blocks.0.hat_pos_embed.cpb_mlp.0.bias, levels.2.blocks.0.hat_pos_embed.cpb_mlp.2.weight, levels.2.blocks.1.hat_pos_embed.relative_bias, levels.2.blocks.1.hat_pos_embed.cpb_mlp.0.weight, levels.2.blocks.1.hat_pos_embed.cpb_mlp.0.bias, levels.2.blocks.1.hat_pos_embed.cpb_mlp.2.weight, levels.2.blocks.2.hat_pos_embed.relative_bias, levels.2.blocks.2.hat_pos_embed.cpb_mlp.0.weight, levels.2.blocks.2.hat_pos_embed.cpb_mlp.0.bias, levels.2.blocks.2.hat_pos_embed.cpb_mlp.2.weight, levels.2.blocks.3.hat_pos_embed.relative_bias, levels.2.blocks.3.hat_pos_embed.cpb_mlp.0.weight, levels.2.blocks.3.hat_pos_embed.cpb_mlp.0.bias, levels.2.blocks.3.hat_pos_embed.cpb_mlp.2.weight, levels.2.blocks.4.hat_pos_embed.relative_bias, levels.2.blocks.4.hat_pos_embed.cpb_mlp.0.weight, levels.2.blocks.4.hat_pos_embed.cpb_mlp.0.bias, levels.2.blocks.4.hat_pos_embed.cpb_mlp.2.weight, levels.2.blocks.5.hat_pos_embed.relative_bias, levels.2.blocks.5.hat_pos_embed.cpb_mlp.0.weight, levels.2.blocks.5.hat_pos_embed.cpb_mlp.0.bias, levels.2.blocks.5.hat_pos_embed.cpb_mlp.2.weight

I noticed that all the mismatches are about position embedding within the window attention. Is it a bug or just work like this?

Thanks @TsingWei . It is not a bug. We don't resize the pos_emb in the pip package.