cannot pass the test in the multi-gpu case

Question

cannot pass the test in the multi-gpu case

wandering007 opened this issue 6 years ago · 1 comments

Before testing the efficient densenet implementation, out = F.dropout(out, p=0.5, training=self.training) at Line 184 in densenet.py should be commented.

Then if I set multigpus = True in test_densenet.py, running python test_densenet.py will get the following error:

Traceback (most recent call last):
  File "test_densenet.py", line 47, in <module>
    out_effi.sum().backward()
  File "/home/changmao/miniconda3/lib/python3.5/site-packages/torch/tensor.py", line 93, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/changmao/miniconda3/lib/python3.5/site-packages/torch/autograd/__init__.py", line 89, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

And I cannot locate the improper inplace operation. All I know is that the error seems occur after the error-backpropagation of few efficient bottleneck modules. All the code is run by Pytorch v0.4.0.

Answer 1 · 2018-04-29T16:15:14.000Z

I magically fix the bug by using momentum=0 for F.batch_norm in prepare_forward function. Maybe without in-place copy operations for running_mean and runining_var, it passes some gradient checks...