RuntimeError: cudnn RNN backward can only be called in training mode
Yu-Wu opened this issue · 10 comments
Hi,
Thanks for your great work.
However, I met some problem when running this code.
I followed the instructions and put the required files in /data folder and run training command.
My environments: Pytorch 0.3.1, Cuda 9.0, Cudnn 7.1.2.
Could you help me to run the script correctly?
➜ ave_code git:(master) ✗ source activate pytorch0.3.1
(pytorch0.3.1) ➜ ave_code git:(master) ✗ python weak_supervised_main.py --train
/home/wuyu07/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
3517
=== Epoch {0} Loss: {0.7096} Running time: {2.684252}
0.06890547263681591
Traceback (most recent call last):
File "weak_supervised_main.py", line 171, in <module>
train(args)
File "weak_supervised_main.py", line 93, in train
loss.backward()
File "/home/wuyu07/anaconda3/lib/python3.6/site-packages/torch/tensor.py", line 102, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/wuyu07/anaconda3/lib/python3.6/site-packages/torch/autograd/__init__.py", line 90, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: cudnn RNN backward can only be called in training mode
Just re-install a machine and run the code.
Environment: 0.3.1-py36_cuda9.1.85_cudnn7.0.5_2 pytorch [cuda91]
Results:
3517
=== Epoch {0} Loss: {0.7096} Running time: {2.906535}
0.08358208955223881
=== Epoch {1} Loss: {0.7093} Running time: {1.947218}
=== Epoch {2} Loss: {0.7089} Running time: {1.911076}
=== Epoch {3} Loss: {0.7080} Running time: {1.911666}
=== Epoch {4} Loss: {0.7067} Running time: {1.905607}
=== Epoch {5} Loss: {0.7043} Running time: {1.963768}
0.3875621890547264
I did not meet the same issue. Can you config the same environment? The issue may be from the different Cudnn version.
BTW, there is a small typo in previous code "data/video_feature_noisy.h5" ==> "data/visual_feature_noisy.h5". I think you have noticed that before.
Thanks for your quick reply.
Finally, I found the reason.
I run your code in Pytorch 1.0 by mistake, instead of Pytorch 0.3.1.
Loading model parameters.
/usr/local/lib/python3.7/dist-packages/torchtext/data/field.py:197: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
return Variable(arr, volatile=not train), lengths
/usr/local/lib/python3.7/dist-packages/torchtext/data/field.py:198: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
return Variable(arr, volatile=not train)
/content/Seq2Sick/onmt/translate/Translator.py:48: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
def var(a): return Variable(a, volatile=True)
/content/Seq2Sick/onmt/modules/GlobalAttention.py:179: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
align_vectors = self.sm(align.view(batch*targetL, sourceL))
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py:119: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.
input = module(input)
/content/Seq2Sick/onmt/translate/Translator.py:191: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
src.volatile = False
attack.py:64: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.
output_a, attn, output_i= translator.getOutput(new_embedding, src, batch)
tensor(18.6335, device='cuda:0') tensor(0., device='cuda:0')
tensor(999., device='cuda:0') tensor(0., device='cuda:0')
Traceback (most recent call last):
File "attack.py", line 312, in <module>
main()
File "attack.py", line 272, in main
modifier, output_a, attn, new_word, output_i, CFLAG = attack(all_word_embedding, label_onehot, translator, src, batch, new_embedding, input_embedding, modifier, const, GROUP_LASSO, TARGETED, GRAD_REG, NN)
File "attack.py", line 138, in attack
loss.backward(retain_graph=True)
File "/usr/local/lib/python3.7/dist-packages/torch/tensor.py", line 245, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/usr/local/lib/python3.7/dist-packages/torch/autograd/__init__.py", line 147, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: cudnn RNN backward can only be called in training mode
Loading model parameters. /usr/local/lib/python3.7/dist-packages/torchtext/data/field.py:197: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead. return Variable(arr, volatile=not train), lengths /usr/local/lib/python3.7/dist-packages/torchtext/data/field.py:198: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead. return Variable(arr, volatile=not train) /content/Seq2Sick/onmt/translate/Translator.py:48: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead. def var(a): return Variable(a, volatile=True) /content/Seq2Sick/onmt/modules/GlobalAttention.py:179: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument. align_vectors = self.sm(align.view(batch*targetL, sourceL)) /usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py:119: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument. input = module(input) /content/Seq2Sick/onmt/translate/Translator.py:191: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead. src.volatile = False attack.py:64: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument. output_a, attn, output_i= translator.getOutput(new_embedding, src, batch) tensor(18.6335, device='cuda:0') tensor(0., device='cuda:0') tensor(999., device='cuda:0') tensor(0., device='cuda:0') Traceback (most recent call last): File "attack.py", line 312, in <module> main() File "attack.py", line 272, in main modifier, output_a, attn, new_word, output_i, CFLAG = attack(all_word_embedding, label_onehot, translator, src, batch, new_embedding, input_embedding, modifier, const, GROUP_LASSO, TARGETED, GRAD_REG, NN) File "attack.py", line 138, in attack loss.backward(retain_graph=True) File "/usr/local/lib/python3.7/dist-packages/torch/tensor.py", line 245, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/usr/local/lib/python3.7/dist-packages/torch/autograd/__init__.py", line 147, in backward allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag RuntimeError: cudnn RNN backward can only be called in training mode
@Yu-Wu @YapengTian Please resolve this too, please
Similar to your issue.
Could you check your pytorch version? Choose this Pytorch version: 0.3.1.
Still same error @YapengTian
Still same error @YapengTian
you can try to add "net_model.train()" at the begining of training in the "for epoch in range(args.nb_epoch)".
I knew you must try to install the pytorch version. But, could you print the pytorch version out in case it was not successfully installed? (import torch print(torch.version))
Still same error @YapengTian
you can try to add "net_model.train()" at the begining of training in the "for epoch in range(args.nb_epoch)".
I solved this problem by your comment, thank you so much bro