.backward not working due to in-place ops?

Question

.backward not working due to in-place ops?

nimasadri11 opened this issue 3 years ago · 1 comments

Hi there, I try to train the model, but it gives me the following error:

Traceback (most recent call last):
  File "scripts/train.py", line 188, in <module>
    main(args)
  File "scripts/train.py", line 168, in main
    trainer.train(args.start_iter, args.end_iter)
  File "/home/user/train_engine.py", line 137, in train
    self.run_step()
  File "train.py", line 72, in run_step
    print(loss_dict)
  File "/home/user/.local/lib/python3.8/site-packages/torch/_tensor.py", line 363, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/user/.local/lib/python3.8/site-packages/torch/autograd/__init__.py", line 173, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.HalfTensor [16, 512, 128, 128]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

Answer 1 · 2022-05-29T18:21:55.000Z

Hi, could you locate the line that contains the issue?

I didn't encounter this problem, I'm guessing it might be because of PyTorch versions?