aofrancani/TSformer-VO

An error occurs in pretrained_ViT: True

Closed this issue · 6 comments

Thank you for sharing your great work.

When setting pretrained_ViT: True in args of train.py, the following error occurs. I confirmed that the ViT model was successfully downloaded. Can you tell me how to solve it?

Building model...
--- loading pretrained to start training ---
https://dl.fbaipublicfiles.com/deit/deit_small_patch16_224-cd65a155.pth

Downloading: "https://dl.fbaipublicfiles.com/deit/deit_small_patch16_224-cd65a155.pth" to /home/dmsai3/.cache/torch/hub/checkpoints/deit_small_patch16_224-cd65a155.pth
Traceback (most recent call last):
File "train.py", line 239, in
model, args = build_model(args, model_params)
File "/home/dmsai3/TSformer-VO/build_model.py", line 97, in build_model
load_pretrained(model, num_classes=model_params["num_classes"],
File "/home/dmsai3/TSformer-VO/timesformer/models/helpers.py", line 161, in load_pretrained
elif num_classes != state_dict[classifier_name + '.weight'].size(0):
KeyError: 'head.weight'

Hey, tks for trying out this work, I really appreciate it.

Since you are loading the pretrained ViT, I believe you should follow the same hyperparameter in the architecture you are loading (I'm considering patch_size and embed_dim to get the filename. See https://github.com/aofrancani/TSformer-VO/blob/main/build_model.py#L91).

For example, try the following architectures if you want to load the ViT (model_params in train.py):

tiny --> patch_size=16, embed_dim=192, depth=12, num_heads=3
small --> patch_size=16, embed_dim=384, depth=12, num_heads=6
base --> patch_size=16, embed_dim=768, depth=12, num_heads=12

(you could also try different DEIT architectures: https://github.com/facebookresearch/deit/blob/main/models.py)

I'm sorry it is still hardcoded, I'm not trying to improve it right now, but I think this should work quickly for you.

Thank you for your explanation. I understood that I encountered an error because I downloaded a pretrained-ViT that does not match the patch_size and embed_dim. However, as you know, the default_cfgs in build_model.py specifies the URL of the pretrained-ViT according to the patch_size and embed_dim, and patch_size and embed_dim are retrieved from model_params in train.py. Therefore, I understood that the code is written to download the matching pretrained-ViT based on patch_size and embed_dim.

Indeed, when I set "patch_size", "dim", and "heads" in model_params of train.py to 12, 384, and 6 respectively, I confirmed that it downloaded the small (deit_small_patch16_224-cd65a155.pth) model, so I verified that the error did not occur because model_params and pretrained-ViT did not match.

Upon reviewing the error message, I noticed that the error occurred in the code snippet from helpers.py:

classifier_name = cfg['classifier']
...
elif num_classes != state_dict[classifier_name + '.weight'].size(0):

Here, the classifier_name variable is assigned by cfg['classifier'], but the value of 'classifier' in default_cfgs in build_model.py is set to 'head'. So, I suspect that the KeyError: 'head.weight' error occurred due to this mismatch. If my understanding of the error is correct, could you suggest another solution?

Ok, I see... I don't know if it was a typo, but you mean "patch_size=16" instead of 12, right? The 12 is the depth of the "deit_small"

I couldn't reproduce your error, I made a small test on Google Colab and I was able to read the pretrained-ViT model. If it was not a typo let me know, and I can test it later on my machine (I can't test it rn, probably in the next 2 days)

Yes, "12" was indeed a typo. I had set the patch_size to 16. It worked well in Colab, which is a relief. I'll wait to test it on your machine. Thank you for your attention to this matter.

helpers.py is already updated.

The problem was that state_dict[head.weight] does not exist, but state_dict['model'][head.weight'] does. So I just added the line for the cases when the state_dict has the model as a key:

https://github.com/aofrancani/TSformer-VO/blob/main/timesformer/models/helpers.py#L109

state_dict = model_zoo.load_url(cfg['url'], progress=False, map_location='cpu')
try:
    state_dict = state_dict['model']
except:
    state_dict = state_dict

It worked for me in GoogleColab because I've had this issue before and I solved it in parallel, forgetting to update my local code.
Tks for pointing that out, I hope it works now :)

Yes, training works well without errors with the updated code.
Thank you for fixing the issue and for your attentive support. :)