hustvl/MIMDet

Loading the pre-trained weight from MAE

youngwanLEE opened this issue · 3 comments

Hi,

I have a question about loading the pre-trained weight from MAE.

When I trained mimdet_vit_large model using this command, I got the below message about checkpoint incompatiblekeys and fvcore.common.checkpoint]: No checkpoint found. Initializing model from scratch message.

It seems that the pre-trained checkpoint from MAE was not loaded.

Is these messages normal?

Thanks in advance :)

training command:
python lazyconfig_train_net.py --config-file configs/mimdet/mimdet_vit_large_mask_rcnn_fpn_mr_0p5_800_1333_4xdec_coco_3x --num-gpus <GPU_NUM> mae_checkpoint.path="./checkpoints/mae_pretrain_vit_large_full.pth

incompatible message
Loading ViT Encoder pretrained weights from ./checkpoints/mae_pretrain_vit_large_full.pth.
_IncompatibleKeys(missing_keys=['patch_embed.proj.0.0.weight', 'patch_embed.proj.0.1.weight', 'patch_embed.proj.0.1.bias', 'patch_embed.proj.1.0.weight', 'patch_embed.proj.1.1.weight', 'patch_embed.proj.1.1.bias', 'patch_embed.proj.2.0.weight', 'patch_embed.proj.2.1.weigh
t', 'patch_embed.proj.2.1.bias', 'patch_embed.proj.3.0.weight', 'patch_embed.proj.3.1.weight', 'patch_embed.proj.3.1.bias', 'patch_embed.proj.3.3.weight', 'patch_embed.proj.3.3.bias'], unexpected_keys=['cls_token', 'mask_token', 'decoder_pos_embed', 'decoder_embed.weight'
, 'decoder_embed.bias', 'decoder_blocks.0.norm1.weight', 'decoder_blocks.0.norm1.bias', 'decoder_blocks.0.attn.qkv.weight', 'decoder_blocks.0.attn.qkv.bias', 'decoder_blocks.0.attn.proj.weight', 'decoder_blocks.0.attn.proj.bias', 'decoder_blocks.0.norm2.weight', 'decoder_
blocks.0.norm2.bias', 'decoder_blocks.0.mlp.fc1.weight', 'decoder_blocks.0.mlp.fc1.bias', 'decoder_blocks.0.mlp.fc2.weight', 'decoder_blocks.0.mlp.fc2.bias', 'decoder_blocks.1.norm1.weight', 'decoder_blocks.1.norm1.bias', 'decoder_blocks.1.attn.qkv.weight', 'decoder_block
s.1.attn.qkv.bias', 'decoder_blocks.1.attn.proj.weight', 'decoder_blocks.1.attn.proj.bias', 'decoder_blocks.1.norm2.weight', 'decoder_blocks.1.norm2.bias', 'decoder_blocks.1.mlp.fc1.weight', 'decoder_blocks.1.mlp.fc1.bias', 'decoder_blocks.1.mlp.fc2.weight', 'decoder_blo$
ks.1.mlp.fc2.bias', 'decoder_blocks.2.norm1.weight', 'decoder_blocks.2.norm1.bias', 'decoder_blocks.2.attn.qkv.weight', 'decoder_blocks.2.attn.qkv.bias', 'decoder_blocks.2.attn.proj.weight', 'decoder_blocks.2.attn.proj.bias', 'decoder_blocks.2.norm2.weight', 'decoder_blo$ks.2.norm2.bias', 'decoder_blocks.2.mlp.fc1.weight', 'decoder_blocks.2.mlp.fc1.bias', 'decoder_blocks.2.mlp.fc2.weight', 'decoder_blocks.2.mlp.fc2.bias', 'decoder_blocks.3.norm1.weight', 'decoder_blocks.3.norm1.bias', 'decoder_blocks.3.attn.qkv.weight', 'decoder_blocks.3$
attn.qkv.bias', 'decoder_blocks.3.attn.proj.weight', 'decoder_blocks.3.attn.proj.bias', 'decoder_blocks.3.norm2.weight', 'decoder_blocks.3.norm2.bias', 'decoder_blocks.3.mlp.fc1.weight', 'decoder_blocks.3.mlp.fc1.bias', 'decoder_blocks.3.mlp.fc2.weight', 'decoder_blocks.$
.mlp.fc2.bias', 'decoder_blocks.4.norm1.weight', 'decoder_blocks.4.norm1.bias', 'decoder_blocks.4.attn.qkv.weight', 'decoder_blocks.4.attn.qkv.bias', 'decoder_blocks.4.attn.proj.weight', 'decoder_blocks.4.attn.proj.bias', 'decoder_blocks.4.norm2.weight', 'decoder_blocks.$
.norm2.bias', 'decoder_blocks.4.mlp.fc1.weight', 'decoder_blocks.4.mlp.fc1.bias', 'decoder_blocks.4.mlp.fc2.weight', 'decoder_blocks.4.mlp.fc2.bias', 'decoder_blocks.5.norm1.weight', 'decoder_blocks.5.norm1.bias', 'decoder_blocks.5.attn.qkv.weight', 'decoder_blocks.5.att$.qkv.bias', 'decoder_blocks.5.attn.proj.weight', 'decoder_blocks.5.attn.proj.bias', 'decoder_blocks.5.norm2.weight', 'decoder_blocks.5.norm2.bias', 'decoder_blocks.5.mlp.fc1.weight', 'decoder_blocks.5.mlp.fc1.bias', 'decoder_blocks.5.mlp.fc2.weight', 'decoder_blocks.5.ml$
.fc2.bias', 'decoder_blocks.6.norm1.weight', 'decoder_blocks.6.norm1.bias', 'decoder_blocks.6.attn.qkv.weight', 'decoder_blocks.6.attn.qkv.bias', 'decoder_blocks.6.attn.proj.weight', 'decoder_blocks.6.attn.proj.bias', 'decoder_blocks.6.norm2.weight', 'decoder_blocks.6.no$
m2.bias', 'decoder_blocks.6.mlp.fc1.weight', 'decoder_blocks.6.mlp.fc1.bias', 'decoder_blocks.6.mlp.fc2.weight', 'decoder_blocks.6.mlp.fc2.bias', 'decoder_blocks.7.norm1.weight', 'decoder_blocks.7.norm1.bias', 'decoder_blocks.7.attn.qkv.weight', 'decoder_blocks.7.attn.qk$.bias', 'decoder_blocks.7.attn.proj.weight', 'decoder_blocks.7.attn.proj.bias', 'decoder_blocks.7.norm2.weight', 'decoder_blocks.7.norm2.bias', 'decoder_blocks.7.mlp.fc1.weight', 'decoder_blocks.7.mlp.fc1.bias', 'decoder_blocks.7.mlp.fc2.weight', 'decoder_blocks.7.mlp.fc$
.bias', 'decoder_norm.weight', 'decoder_norm.bias', 'decoder_pred.weight', 'decoder_pred.bias', 'patch_embed.proj.weight', 'patch_embed.proj.bias'])
Loading ViT Decoder pretrained weights from ./checkpoints/mae_pretrain_vit_large_full.pth.
_IncompatibleKeys(missing_keys=[], unexpected_keys=['cls_token', 'pos_embed', 'patch_embed.proj.weight', 'patch_embed.proj.bias', 'blocks.0.norm1.weight', 'blocks.0.norm1.bias', 'blocks.0.attn.qkv.weight', 'blocks.0.attn.qkv.bias', 'blocks.0.attn.proj.weight', 'blocks.0.a
ttn.proj.bias', 'blocks.0.norm2.weight', 'blocks.0.norm2.bias', 'blocks.0.mlp.fc1.weight', 'blocks.0.mlp.fc1.bias', 'blocks.0.mlp.fc2.weight', 'blocks.0.mlp.fc2.bias', 'blocks.1.norm1.weight', 'blocks.1.norm1.bias', 'blocks.1.attn.qkv.weight', 'blocks.1.attn.qkv.bias', 'b
locks.1.attn.proj.weight', 'blocks.1.attn.proj.bias', 'blocks.1.norm2.weight', 'blocks.1.norm2.bias', 'blocks.1.mlp.fc1.weight', 'blocks.1.mlp.fc1.bias', 'blocks.1.mlp.fc2.weight', 'blocks.1.mlp.fc2.bias', 'blocks.2.norm1.weight', 'blocks.2.norm1.bias', 'blocks.2.attn.qkv
.weight', 'blocks.2.attn.qkv.bias', 'blocks.2.attn.proj.weight', 'blocks.2.attn.proj.bias', 'blocks.2.norm2.weight', 'blocks.2.norm2.bias', 'blocks.2.mlp.fc1.weight', 'blocks.2.mlp.fc1.bias', 'blocks.2.mlp.fc2.weight', 'blocks.2.mlp.fc2.bias', 'blocks.3.norm1.weight', 'bl
ocks.3.norm1.bias', 'blocks.3.attn.qkv.weight', 'blocks.3.attn.qkv.bias', 'blocks.3.attn.proj.weight', 'blocks.3.attn.proj.bias', 'blocks.3.norm2.weight', 'blocks.3.norm2.bias', 'blocks.3.mlp.fc1.weight', 'blocks.3.mlp.fc1.bias', 'blocks.3.mlp.fc2.weight', 'blocks.3.mlp.f
c2.bias', 'blocks.4.norm1.weight', 'blocks.4.norm1.bias', 'blocks.4.attn.qkv.weight', 'blocks.4.attn.qkv.bias', 'blocks.4.attn.proj.weight', 'blocks.4.attn.proj.bias', 'blocks.4.norm2.weight', 'blocks.4.norm2.bias', 'blocks.4.mlp.fc1.weight', 'blocks.4.mlp.fc1.bias', 'blo
cks.4.mlp.fc2.weight', 'blocks.4.mlp.fc2.bias', 'blocks.5.norm1.weight', 'blocks.5.norm1.bias', 'blocks.5.attn.qkv.weight', 'blocks.5.attn.qkv.bias', 'blocks.5.attn.proj.weight', 'blocks.5.attn.proj.bias', 'blocks.5.norm2.weight', 'blocks.5.norm2.bias', 'blocks.5.mlp.fc1.
weight', 'blocks.5.mlp.fc1.bias', 'blocks.5.mlp.fc2.weight', 'blocks.5.mlp.fc2.bias', 'blocks.6.norm1.weight', 'blocks.6.norm1.bias', 'blocks.6.attn.qkv.weight', 'blocks.6.attn.qkv.bias', 'blocks.6.attn.proj.weight', 'blocks.6.attn.proj.bias', 'blocks.6.norm2.weight', 'bl
ocks.6.norm2.bias', 'blocks.6.mlp.fc1.weight', 'blocks.6.mlp.fc1.bias', 'blocks.6.mlp.fc2.weight', 'blocks.6.mlp.fc2.bias', 'blocks.7.norm1.weight', 'blocks.7.norm1.bias', 'blocks.7.attn.qkv.weight', 'blocks.7.attn.qkv.bias', 'blocks.7.attn.proj.weight', 'blocks.7.attn.pr
oj.bias', 'blocks.7.norm2.weight', 'blocks.7.norm2.bias', 'blocks.7.mlp.fc1.weight', 'blocks.7.mlp.fc1.bias', 'blocks.7.mlp.fc2.weight', 'blocks.7.mlp.fc2.bias', 'blocks.8.norm1.weight', 'blocks.8.norm1.bias', 'blocks.8.attn.qkv.weight', 'blocks.8.attn.qkv.bias', 'blocks.
8.attn.proj.weight', 'blocks.8.attn.proj.bias', 'blocks.8.norm2.weight', 'blocks.8.norm2.bias', 'blocks.8.mlp.fc1.weight', 'blocks.8.mlp.fc1.bias', 'blocks.8.mlp.fc2.weight', 'blocks.8.mlp.fc2.bias', 'blocks.9.norm1.weight', 'blocks.9.norm1.bias', 'blocks.9.attn.qkv.weigh
t', 'blocks.9.attn.qkv.bias', 'blocks.9.attn.proj.weight', 'blocks.9.attn.proj.bias', 'blocks.9.norm2.weight', 'blocks.9.norm2.bias', 'blocks.9.mlp.fc1.weight', 'blocks.9.mlp.fc1.bias', 'blocks.9.mlp.fc2.weight', 'blocks.9.mlp.fc2.bias', 'blocks.10.norm1.weight', 'blocks.
10.norm1.bias', 'blocks.10.attn.qkv.weight', 'blocks.10.attn.qkv.bias', 'blocks.10.attn.proj.weight', 'blocks.10.attn.proj.bias', 'blocks.10.norm2.weight', 'blocks.10.norm2.bias', 'blocks.10.mlp.fc1.weight', 'blocks.10.mlp.fc1.bias', 'blocks.10.mlp.fc2.weight', 'blocks.10
.mlp.fc2.bias', 'blocks.11.norm1.weight', 'blocks.11.norm1.bias', 'blocks.11.attn.qkv.weight', 'blocks.11.attn.qkv.bias', 'blocks.11.attn.proj.weight', 'blocks.11.attn.proj.bias', 'blocks.11.norm2.weight', 'blocks.11.norm2.bias', 'blocks.11.mlp.fc1.weight', 'blocks.11.mlp
.fc1.bias', 'blocks.11.mlp.fc2.weight', 'blocks.11.mlp.fc2.bias', 'blocks.12.norm1.weight', 'blocks.12.norm1.bias', 'blocks.12.attn.qkv.weight', 'blocks.12.attn.qkv.bias', 'blocks.12.attn.proj.weight', 'blocks.12.attn.proj.bias', 'blocks.12.norm2.weight', 'blocks.12.norm2
.bias', 'blocks.12.mlp.fc1.weight', 'blocks.12.mlp.fc1.bias', 'blocks.12.mlp.fc2.weight', 'blocks.12.mlp.fc2.bias', 'blocks.13.norm1.weight', 'blocks.13.norm1.bias', 'blocks.13.attn.qkv.weight', 'blocks.13.attn.qkv.bias', 'blocks.13.attn.proj.weight', 'blocks.13.attn.proj
.bias', 'blocks.13.norm2.weight', 'blocks.13.norm2.bias', 'blocks.13.mlp.fc1.weight', 'blocks.13.mlp.fc1.bias', 'blocks.13.mlp.fc2.weight', 'blocks.13.mlp.fc2.bias', 'blocks.14.norm1.weight', 'blocks.14.norm1.bias', 'blocks.14.attn.qkv.weight', 'blocks.14.attn.qkv.bias',
'blocks.14.attn.proj.weight', 'blocks.14.attn.proj.bias', 'blocks.14.norm2.weight', 'blocks.14.norm2.bias', 'blocks.14.mlp.fc1.weight', 'blocks.14.mlp.fc1.bias', 'blocks.14.mlp.fc2.weight', 'blocks.14.mlp.fc2.bias', 'blocks.15.norm1.weight', 'blocks.15.norm1.bias', 'block
s.15.attn.qkv.weight', 'blocks.15.attn.qkv.bias', 'blocks.15.attn.proj.weight', 'blocks.15.attn.proj.bias', 'blocks.15.norm2.weight', 'blocks.15.norm2.bias', 'blocks.15.mlp.fc1.weight', 'blocks.15.mlp.fc1.bias', 'blocks.15.mlp.fc2.weight', 'blocks.15.mlp.fc2.bias', 'block
s.16.norm1.weight', 'blocks.16.norm1.bias', 'blocks.16.attn.qkv.weight', 'blocks.16.attn.qkv.bias', 'blocks.16.attn.proj.weight', 'blocks.16.attn.proj.bias', 'blocks.16.norm2.weight', 'blocks.16.norm2.bias', 'blocks.16.mlp.fc1.weight', 'blocks.16.mlp.fc1.bias', 'blocks.16
.mlp.fc2.weight', 'blocks.16.mlp.fc2.bias', 'blocks.17.norm1.weight', 'blocks.17.norm1.bias', 'blocks.17.attn.qkv.weight', 'blocks.17.attn.qkv.bias', 'blocks.17.attn.proj.weight', 'blocks.17.attn.proj.bias', 'blocks.17.norm2.weight', 'blocks.17.norm2.bias', 'blocks.17.mlp
.fc1.weight', 'blocks.17.mlp.fc1.bias', 'blocks.17.mlp.fc2.weight', 'blocks.17.mlp.fc2.bias', 'blocks.18.norm1.weight', 'blocks.18.norm1.bias', 'blocks.18.attn.qkv.weight', 'blocks.18.attn.qkv.bias', 'blocks.18.attn.proj.weight', 'blocks.18.attn.proj.bias', 'blocks.18.nor
m2.weight', 'blocks.18.norm2.bias', 'blocks.18.mlp.fc1.weight', 'blocks.18.mlp.fc1.bias', 'blocks.18.mlp.fc2.weight', 'blocks.18.mlp.fc2.bias', 'blocks.19.norm1.weight', 'blocks.19.norm1.bias', 'blocks.19.attn.qkv.weight', 'blocks.19.attn.qkv.bias', 'blocks.19.attn.proj.w
eight', 'blocks.19.attn.proj.bias', 'blocks.19.norm2.weight', 'blocks.19.norm2.bias', 'blocks.19.mlp.fc1.weight', 'blocks.19.mlp.fc1.bias', 'blocks.19.mlp.fc2.weight', 'blocks.19.mlp.fc2.bias', 'blocks.20.norm1.weight', 'blocks.20.norm1.bias', 'blocks.20.attn.qkv.weight',
 'blocks.20.attn.qkv.bias', 'blocks.20.attn.proj.weight', 'blocks.20.attn.proj.bias', 'blocks.20.norm2.weight', 'blocks.20.norm2.bias', 'blocks.20.mlp.fc1.weight', 'blocks.20.mlp.fc1.bias', 'blocks.20.mlp.fc2.weight', 'blocks.20.mlp.fc2.bias', 'blocks.21.norm1.weight', 'b
locks.21.norm1.bias', 'blocks.21.attn.qkv.weight', 'blocks.21.attn.qkv.bias', 'blocks.21.attn.proj.weight', 'blocks.21.attn.proj.bias', 'blocks.21.norm2.weight', 'blocks.21.norm2.bias', 'blocks.21.mlp.fc1.weight', 'blocks.21.mlp.fc1.bias', 'blocks.21.mlp.fc2.weight', 'blo
cks.21.mlp.fc2.bias', 'blocks.22.norm1.weight', 'blocks.22.norm1.bias', 'blocks.22.attn.qkv.weight', 'blocks.22.attn.qkv.bias', 'blocks.22.attn.proj.weight', 'blocks.22.attn.proj.bias', 'blocks.22.norm2.weight', 'blocks.22.norm2.bias', 'blocks.22.mlp.fc1.weight', 'blocks.
22.mlp.fc1.bias', 'blocks.22.mlp.fc2.weight', 'blocks.22.mlp.fc2.bias', 'blocks.23.norm1.weight', 'blocks.23.norm1.bias', 'blocks.23.attn.qkv.weight', 'blocks.23.attn.qkv.bias', 'blocks.23.attn.proj.weight', 'blocks.23.attn.proj.bias', 'blocks.23.norm2.weight', 'blocks.23
.norm2.bias', 'blocks.23.mlp.fc1.weight', 'blocks.23.mlp.fc1.bias', 'blocks.23.mlp.fc2.weight', 'blocks.23.mlp.fc2.bias', 'norm.weight', 'norm.bias', 'decoder_pred.weight', 'decoder_pred.bias', 'decoder_blocks.4.norm1.weight', 'decoder_blocks.4.norm1.bias', 'decoder_block
s.4.attn.qkv.weight', 'decoder_blocks.4.attn.qkv.bias', 'decoder_blocks.4.attn.proj.weight', 'decoder_blocks.4.attn.proj.bias', 'decoder_blocks.4.norm2.weight', 'decoder_blocks.4.norm2.bias', 'decoder_blocks.4.mlp.fc1.weight', 'decoder_blocks.4.mlp.fc1.bias', 'decoder_blo
cks.4.mlp.fc2.weight', 'decoder_blocks.4.mlp.fc2.bias', 'decoder_blocks.5.norm1.weight', 'decoder_blocks.5.norm1.bias', 'decoder_blocks.5.attn.qkv.weight', 'decoder_blocks.5.attn.qkv.bias', 'decoder_blocks.5.attn.proj.weight', 'decoder_blocks.5.attn.proj.bias', 'decoder_b
locks.5.norm2.weight', 'decoder_blocks.5.norm2.bias', 'decoder_blocks.5.mlp.fc1.weight', 'decoder_blocks.5.mlp.fc1.bias', 'decoder_blocks.5.mlp.fc2.weight', 'decoder_blocks.5.mlp.fc2.bias', 'decoder_blocks.6.norm1.weight', 'decoder_blocks.6.norm1.bias', 'decoder_blocks.6.
attn.qkv.weight', 'decoder_blocks.6.attn.qkv.bias', 'decoder_blocks.6.attn.proj.weight', 'decoder_blocks.6.attn.proj.bias', 'decoder_blocks.6.norm2.weight', 'decoder_blocks.6.norm2.bias', 'decoder_blocks.6.mlp.fc1.weight', 'decoder_blocks.6.mlp.fc1.bias', 'decoder_blocks.
6.mlp.fc2.weight', 'decoder_blocks.6.mlp.fc2.bias', 'decoder_blocks.7.norm1.weight', 'decoder_blocks.7.norm1.bias', 'decoder_blocks.7.attn.qkv.weight', 'decoder_blocks.7.attn.qkv.bias', 'decoder_blocks.7.attn.proj.weight', 'decoder_blocks.7.attn.proj.bias', 'decoder_block
s.7.norm2.weight', 'decoder_blocks.7.norm2.bias', 'decoder_blocks.7.mlp.fc1.weight', 'decoder_blocks.7.mlp.fc1.bias', 'decoder_blocks.7.mlp.fc2.weight', 'decoder_blocks.7.mlp.fc2.bias'])

The warning message given by fvcore doesn't matter. fvcore.commom.checkpoint is only used for resume training. It won't re-init the already loaded MAE pretrained weights.
BTW, from your training command, it seems like you use ViT-B with a ViT-L pretrained weights, maybe you should have a double check.
Hope your research goes well :)

@vealocia
Thanks for the quick reply !!

I fixed my question, actually, the messages were from the training vit_large model.

In general, we faced a message when we load the pre-trained weight.

but, I was confused by No checkpoint found. Initializing model from scratch message.

Anyway, you mean that's no problem at all.

Thanks!

@vealocia Thanks for the quick reply !!

I fixed my question, actually, the messages were from the training vit_large model.

In general, we faced a message when we load the pre-trained weight.

but, I was confused by No checkpoint found. Initializing model from scratch message.

Anyway, you mean that's no problem at all.

Thanks!

If your run cannot match the results in our log, please let us know :)