OpenGVLab/VideoMAEv2

Unable to load the distilled model weights provided in the model zoo

druefena opened this issue · 2 comments

How can one load and use the pre-trained distilled models from the model zoo?

First, creating the model using (needed to comment out all non-default params as they are not recognized):

model = create_model(
        'vit_base_patch16_224',
        img_size=224,
        pretrained=False,
        num_classes=710,
        #all_frames=args.num_frames * args.num_segments,
        #tubelet_size=args.tubelet_size,
        #drop_rate=args.drop,
        #drop_path_rate=args.drop_path,
        #attn_drop_rate=args.attn_drop_rate,
        #head_drop_rate=args.head_drop_rate,
        #drop_block_rate=None,
        #use_mean_pooling=args.use_mean_pooling,
        #init_scale=args.init_scale,
        #with_cp=args.with_checkpoint,
    )

When I am trying to load the weights:
https://pjlab-gvm-data.oss-cn-shanghai.aliyuncs.com/internvideo/distill/vit_s_k710_dl_from_giant.pth

using the utils.load_state_dict() function, I get multiple errors, including:
size mismatch for patch_embed.proj.weight: copying a param with shape torch.Size([768, 3, 2, 16, 16]) from checkpoint, the shape in current model is torch.Size([768, 3, 16, 16]).

I assume this might be because the tubelet size is missing, which by default is set to 2 (and could be the dimension I am missing). So I guess the main question is, how to load the model (and which model)?

Any help appreciated, thanks!

How did you solve this problem? Thank you very much.

add "import models # noqa: F401"