microsoft/FocalNet

focalnet_large_fl4_o365_finetuned_on_coco.pth size mismatch

Shiro-LK opened this issue · 1 comments

Hello,

Thank you for sharing your experiment.

I am trying to train an object detection based on focalnet large from this checkpoint :
https://github.com/FocalNet/FocalNet-DINO#training

However some mismatch size is happening. It happened with the checkpoint pretrained on object 365 and then finetuned on coco dataset. I am using this config file : "DINO_4scale_focalnet_large_fl4.py" instead of "DINO_4scale_focalnet_fl4.py" as I did not find it in the repo. I was wondering if the config file uploaded in the repo was the correct one ?

Here the message :

RuntimeError: Error(s) in loading state_dict for DINO:
        size mismatch for transformer.level_embed: copying a param with shape torch.Size([5, 256]) from checkpoint, the shape in current model is torch.Size([4, 256]).
        size mismatch for transformer.encoder.layers.0.self_attn.sampling_offsets.weight: copying a param with shape torch.Size([320, 256]) from checkpoint, the shape in current model is torch.Size([256, 256]).
        size mismatch for transformer.encoder.layers.0.self_attn.sampling_offsets.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for transformer.encoder.layers.0.self_attn.attention_weights.weight: copying a param with shape torch.Size([160, 256]) from checkpoint, the shape in current model is torch.Size([128, 256]).
        size mismatch for transformer.encoder.layers.0.self_attn.attention_weights.bias: copying a param with shape torch.Size([160]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for transformer.encoder.layers.1.self_attn.sampling_offsets.weight: copying a param with shape torch.Size([320, 256]) from checkpoint, the shape in current model is torch.Size([256, 256]).
        size mismatch for transformer.encoder.layers.1.self_attn.sampling_offsets.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for transformer.encoder.layers.1.self_attn.attention_weights.weight: copying a param with shape torch.Size([160, 256]) from checkpoint, the shape in current model is torch.Size([128, 256]).
        size mismatch for transformer.encoder.layers.1.self_attn.attention_weights.bias: copying a param with shape torch.Size([160]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for transformer.encoder.layers.2.self_attn.sampling_offsets.weight: copying a param with shape torch.Size([320, 256]) from checkpoint, the shape in current model is torch.Size([256, 256]).
        size mismatch for transformer.encoder.layers.2.self_attn.sampling_offsets.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for transformer.encoder.layers.2.self_attn.attention_weights.weight: copying a param with shape torch.Size([160, 256]) from checkpoint, the shape in current model is torch.Size([128, 256]).
        size mismatch for transformer.encoder.layers.2.self_attn.attention_weights.bias: copying a param with shape torch.Size([160]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for transformer.encoder.layers.3.self_attn.sampling_offsets.weight: copying a param with shape torch.Size([320, 256]) from checkpoint, the shape in current model is torch.Size([256, 256]).
        size mismatch for transformer.encoder.layers.3.self_attn.sampling_offsets.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for transformer.encoder.layers.3.self_attn.attention_weights.weight: copying a param with shape torch.Size([160, 256]) from checkpoint, the shape in current model is torch.Size([128, 256]).
        size mismatch for transformer.encoder.layers.3.self_attn.attention_weights.bias: copying a param with shape torch.Size([160]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for transformer.encoder.layers.4.self_attn.sampling_offsets.weight: copying a param with shape torch.Size([320, 256]) from checkpoint, the shape in current model is torch.Size([256, 256]).
        size mismatch for transformer.encoder.layers.4.self_attn.sampling_offsets.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for transformer.encoder.layers.4.self_attn.attention_weights.weight: copying a param with shape torch.Size([160, 256]) from checkpoint, the shape in current model is torch.Size([128, 256]).
        size mismatch for transformer.encoder.layers.4.self_attn.attention_weights.bias: copying a param with shape torch.Size([160]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for transformer.encoder.layers.5.self_attn.sampling_offsets.weight: copying a param with shape torch.Size([320, 256]) from checkpoint, the shape in current model is torch.Size([256, 256]).
        size mismatch for transformer.encoder.layers.5.self_attn.sampling_offsets.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for transformer.encoder.layers.5.self_attn.attention_weights.weight: copying a param with shape torch.Size([160, 256]) from checkpoint, the shape in current model is torch.Size([128, 256]).
        size mismatch for transformer.encoder.layers.5.self_attn.attention_weights.bias: copying a param with shape torch.Size([160]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for transformer.decoder.layers.0.cross_attn.sampling_offsets.weight: copying a param with shape torch.Size([320, 256]) from checkpoint, the shape in current model is torch.Size([256, 256]).
        size mismatch for transformer.decoder.layers.0.cross_attn.sampling_offsets.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for transformer.decoder.layers.0.cross_attn.attention_weights.weight: copying a param with shape torch.Size([160, 256]) from checkpoint, the shape in current model is torch.Size([128, 256]).
        size mismatch for transformer.decoder.layers.0.cross_attn.attention_weights.bias: copying a param with shape torch.Size([160]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for transformer.decoder.layers.1.cross_attn.sampling_offsets.weight: copying a param with shape torch.Size([320, 256]) from checkpoint, the shape in current model is torch.Size([256, 256]).
        size mismatch for transformer.decoder.layers.1.cross_attn.sampling_offsets.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for transformer.decoder.layers.1.cross_attn.attention_weights.weight: copying a param with shape torch.Size([160, 256]) from checkpoint, the shape in current model is torch.Size([128, 256]).
        size mismatch for transformer.decoder.layers.1.cross_attn.attention_weights.bias: copying a param with shape torch.Size([160]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for transformer.decoder.layers.2.cross_attn.sampling_offsets.weight: copying a param with shape torch.Size([320, 256]) from checkpoint, the shape in current model is torch.Size([256, 256]).
        size mismatch for transformer.decoder.layers.2.cross_attn.sampling_offsets.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for transformer.decoder.layers.2.cross_attn.attention_weights.weight: copying a param with shape torch.Size([160, 256]) from checkpoint, the shape in current model is torch.Size([128, 256]).
        size mismatch for transformer.decoder.layers.2.cross_attn.attention_weights.bias: copying a param with shape torch.Size([160]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for transformer.decoder.layers.3.cross_attn.sampling_offsets.weight: copying a param with shape torch.Size([320, 256]) from checkpoint, the shape in current model is torch.Size([256, 256]).
        size mismatch for transformer.decoder.layers.3.cross_attn.sampling_offsets.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for transformer.decoder.layers.3.cross_attn.attention_weights.weight: copying a param with shape torch.Size([160, 256]) from checkpoint, the shape in current model is torch.Size([128, 256]).
        size mismatch for transformer.decoder.layers.3.cross_attn.attention_weights.bias: copying a param with shape torch.Size([160]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for transformer.decoder.layers.4.cross_attn.sampling_offsets.weight: copying a param with shape torch.Size([320, 256]) from checkpoint, the shape in current model is torch.Size([256, 256]).
        size mismatch for transformer.decoder.layers.4.cross_attn.sampling_offsets.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for transformer.decoder.layers.4.cross_attn.attention_weights.weight: copying a param with shape torch.Size([160, 256]) from checkpoint, the shape in current model is torch.Size([128, 256]).
        size mismatch for transformer.decoder.layers.4.cross_attn.attention_weights.bias: copying a param with shape torch.Size([160]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for transformer.decoder.layers.5.cross_attn.sampling_offsets.weight: copying a param with shape torch.Size([320, 256]) from checkpoint, the shape in current model is torch.Size([256, 256]).
        size mismatch for transformer.decoder.layers.5.cross_attn.sampling_offsets.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for transformer.decoder.layers.5.cross_attn.attention_weights.weight: copying a param with shape torch.Size([160, 256]) from checkpoint, the shape in current model is torch.Size([128, 256]).
        size mismatch for transformer.decoder.layers.5.cross_attn.attention_weights.bias: copying a param with shape torch.Size([160]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for input_proj.0.0.weight: copying a param with shape torch.Size([256, 192, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 384, 1, 1]).
        size mismatch for input_proj.1.0.weight: copying a param with shape torch.Size([256, 384, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 768, 1, 1]).
        size mismatch for input_proj.2.0.weight: copying a param with shape torch.Size([256, 768, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 1536, 1, 1]).
        size mismatch for input_proj.3.0.weight: copying a param with shape torch.Size([256, 1536, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 1536, 3, 3]).

Hi, @Shiro-LK , I think it is because you are using 4scale DINO config. I changed it to 5scale. Could you please try this new config:

https://github.com/FocalNet/FocalNet-DINO/blob/main/config/DINO/DINO_5scale_focalnet_large_fl4.py