tgc1997/RMN

the mismatch error happened when using the pretarined model you provide.

Closed this issue · 5 comments

awesome work!
when i reproduce the results you report in this repository (i.e. cider metric score is 97.8 on msvd dataset), errors indicating size mismatch for the whole Capmodel occurred as running evaluate.py with your pretrained file results/msvd_model/msvd_best_cider.pth.
e. g.
Runtime error: Error(s) in loading state_dictionary for CapModel:
size mismatch for encoder.bi_lstm1.weight_it_l0: copying a parameters with shape torch.Size([2048,1000]) from checkpoint, the shape in current model is torch.Size([5200,1000]).
size mismatch ……
size mismatch ……
it seems like you have modified the model while don't update the msvd_best_cider.pth.
if you do so please let me know
and i would appreciate it if you provide the new version PTH file so that i can reproduce the results you report in this repository.
by the way why the final high results was not published in the paper?
thanks!

Sorry, I can not reproduce your errors, you can check the tensors' sizes step by step (5200 is a strange number). The training of the model is not very stable, so the final result of MSVD in the paper is an average of three models' results.

thanks for your tips
i try to reproduce the project again using another machine but the same error is reported
the error i met before
2021-04-13 13-37-40 的屏幕截图
i attempt to debug the evaluate.py according to your suggestions
the strange number 5200 occurs as follows:
2021-04-13 13-33-43 的屏幕截图
the errors may be caused by the parameters' size inside the bi_lstm
so could you show the screenshot pictures just like the same with the second pictures i uploaded when you run the evaluate.py in debug mode.
it confused me a lot and i do want to find out the reason.
and i argue that the core problem still is that the net structure is incompitable with the msvd checkpoint files.
thank you for your kindly and generous help again!

make sure you didn't change the size parameter in utils.opt.py, and run with the following command:
python evaluate.py --dataset=msvd --model=RMN --result_dir=results/msvd_model --use_loc --use_rel --use_func --hidden_size=512 --att_size=1024 --test_batch_size=2 --beam_size=2 --eval_metric=CIDEr

And I tried to output the size as you did, but I didn't find anything wrong:
weight_size

Note that the hidden size for msvd is 512 as mentioned in the paper.

Ooops... i got it wrong and you found it!
the hidden size parameter set in my run command, 1000 for msr-vtt, did not replaced by 512 for msvd.
u're so cool & nice.
thanks a lot for your timely help and reply! : )