cannot pass the test in the multi-gpu case
wandering007 opened this issue · 1 comments
wandering007 commented
Before testing the efficient densenet implementation, out = F.dropout(out, p=0.5, training=self.training)
at Line 184 in densenet.py
should be commented.
Then if I set multigpus = True
in test_densenet.py
, running python test_densenet.py
will get the following error:
Traceback (most recent call last):
File "test_densenet.py", line 47, in <module>
out_effi.sum().backward()
File "/home/changmao/miniconda3/lib/python3.5/site-packages/torch/tensor.py", line 93, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/changmao/miniconda3/lib/python3.5/site-packages/torch/autograd/__init__.py", line 89, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation
And I cannot locate the improper inplace operation. All I know is that the error seems occur after the error-backpropagation of few efficient bottleneck modules. All the code is run by Pytorch v0.4.0.
wandering007 commented
I magically fix the bug by using momentum=0
for F.batch_norm
in prepare_forward
function. Maybe without in-place copy operations for running_mean
and runining_var
, it passes some gradient checks...