xiadingZ/video-caption.pytorch

Question about LSTM?

Stephen-Adams opened this issue · 3 comments

I tried the code with '--rnn_type lstm', and the training with model S2VTModel is nomal, but when I try to eval the results, I got the following error:

('vocab size is ', 16860)
('number of train videos: ', 6513)
('number of val videos: ', 497)
('number of test videos: ', 2990)
load feats from [u'data/feats/resnet152']
('max sequence length in data is', 28)
/usr/local/lib/python2.7/dist-packages/torch/nn/modules/rnn.py:38: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.5 and num_layers=1
"num_layers={}".format(dropout, num_layers))
Traceback (most recent call last):
File "eval.py", line 148, in
main(opt, i)
File "eval.py", line 75, in main
dataset = VideoDataset(opt, "val")
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 721, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for S2VTAttModel:
While copying the parameter named "encoder.rnn.weight_hh_l0", whose dimensions in the model are torch.Size([1536, 512]) and whose dimensions in the checkpoint are torch.Size([2048, 512]).
While copying the parameter named "encoder.rnn.weight_ih_l0", whose dimensions in the model are torch.Size([1536, 512]) and whose dimensions in the checkpoint are torch.Size([2048, 512]).
While copying the parameter named "encoder.rnn.bias_ih_l0", whose dimensions in the model are torch.Size([1536]) and whose dimensions in the checkpoint are torch.Size([2048]).
While copying the parameter named "encoder.rnn.bias_hh_l0", whose dimensions in the model are torch.Size([1536]) and whose dimensions in the checkpoint are torch.Size([2048]).
While copying the parameter named "decoder.rnn.weight_hh_l0", whose dimensions in the model are torch.Size([1536, 512]) and whose dimensions in the checkpoint are torch.Size([2048, 512]).
While copying the parameter named "decoder.rnn.weight_ih_l0", whose dimensions in the model are torch.Size([1536, 1024]) and whose dimensions in the checkpoint are torch.Size([2048, 1024]).
While copying the parameter named "decoder.rnn.bias_ih_l0", whose dimensions in the model are torch.Size([1536]) and whose dimensions in the checkpoint are torch.Size([2048]).
While copying the parameter named "decoder.rnn.bias_hh_l0", whose dimensions in the model are torch.Size([1536]) and whose dimensions in the checkpoint are torch.Size([2048]).

I would like to find out why it happens, and could you help me? Thanks.

I tried the code with '--rnn_type lstm', and the training with model S2VTModel is nomal, but when I try to eval the results, I got the following error:

('vocab size is ', 16860)
('number of train videos: ', 6513)
('number of val videos: ', 497)
('number of test videos: ', 2990)
load feats from [u'data/feats/resnet152']
('max sequence length in data is', 28)
/usr/local/lib/python2.7/dist-packages/torch/nn/modules/rnn.py:38: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.5 and num_layers=1
"num_layers={}".format(dropout, num_layers))
Traceback (most recent call last):
File "eval.py", line 148, in
main(opt, i)
File "eval.py", line 75, in main
dataset = VideoDataset(opt, "val")
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 721, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for S2VTAttModel:
While copying the parameter named "encoder.rnn.weight_hh_l0", whose dimensions in the model are torch.Size([1536, 512]) and whose dimensions in the checkpoint are torch.Size([2048, 512]).
While copying the parameter named "encoder.rnn.weight_ih_l0", whose dimensions in the model are torch.Size([1536, 512]) and whose dimensions in the checkpoint are torch.Size([2048, 512]).
While copying the parameter named "encoder.rnn.bias_ih_l0", whose dimensions in the model are torch.Size([1536]) and whose dimensions in the checkpoint are torch.Size([2048]).
While copying the parameter named "encoder.rnn.bias_hh_l0", whose dimensions in the model are torch.Size([1536]) and whose dimensions in the checkpoint are torch.Size([2048]).
While copying the parameter named "decoder.rnn.weight_hh_l0", whose dimensions in the model are torch.Size([1536, 512]) and whose dimensions in the checkpoint are torch.Size([2048, 512]).
While copying the parameter named "decoder.rnn.weight_ih_l0", whose dimensions in the model are torch.Size([1536, 1024]) and whose dimensions in the checkpoint are torch.Size([2048, 1024]).
While copying the parameter named "decoder.rnn.bias_ih_l0", whose dimensions in the model are torch.Size([1536]) and whose dimensions in the checkpoint are torch.Size([2048]).
While copying the parameter named "decoder.rnn.bias_hh_l0", whose dimensions in the model are torch.Size([1536]) and whose dimensions in the checkpoint are torch.Size([2048]).

I would like to find out why it happens, and could you help me? Thanks.

Sorry for my mistake...the error information should be:
('vocab size is ', 16860)
('number of train videos: ', 6513)
('number of val videos: ', 497)
('number of test videos: ', 2990)
load feats from [u'data/feats/resnet152']
('max sequence length in data is', 28)
/usr/local/lib/python2.7/dist-packages/torch/nn/modules/rnn.py:38: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.5 and num_layers=1
"num_layers={}".format(dropout, num_layers))
Traceback (most recent call last):
File "eval.py", line 148, in
main(opt, i)
File "eval.py", line 75, in main
dataset = VideoDataset(opt, "val")
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 721, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for S2VTModel:
While copying the parameter named "rnn1.weight_hh_l0", whose dimensions in the model are torch.Size([1536, 512]) and whose dimensions in the checkpoint are torch.Size([2048, 512]).
While copying the parameter named "rnn1.weight_ih_l0", whose dimensions in the model are torch.Size([1536, 4096]) and whose dimensions in the checkpoint are torch.Size([2048, 4096]).
While copying the parameter named "rnn1.bias_ih_l0", whose dimensions in the model are torch.Size([1536]) and whose dimensions in the checkpoint are torch.Size([2048]).
While copying the parameter named "rnn1.bias_hh_l0", whose dimensions in the model are torch.Size([1536]) and whose dimensions in the checkpoint are torch.Size([2048]).
While copying the parameter named "rnn2.weight_hh_l0", whose dimensions in the model are torch.Size([1536, 512]) and whose dimensions in the checkpoint are torch.Size([2048, 512]).
While copying the parameter named "rnn2.weight_ih_l0", whose dimensions in the model are torch.Size([1536, 1024]) and whose dimensions in the checkpoint are torch.Size([2048, 1024]).
While copying the parameter named "rnn2.bias_ih_l0", whose dimensions in the model are torch.Size([1536]) and whose dimensions in the checkpoint are torch.Size([2048]).
While copying the parameter named "rnn2.bias_hh_l0", whose dimensions in the model are torch.Size([1536]) and whose dimensions in the checkpoint are torch.Size([2048]).

I met the same problem, how can I solve it, can you help me, thank you.

I also encountered the same problem. How can I solve it