dingmyu/davit

.backward not working due to in-place ops?

nimasadri11 opened this issue · 1 comments

Hi there, I try to train the model, but it gives me the following error:

Traceback (most recent call last):
  File "scripts/train.py", line 188, in <module>
    main(args)
  File "scripts/train.py", line 168, in main
    trainer.train(args.start_iter, args.end_iter)
  File "/home/user/train_engine.py", line 137, in train
    self.run_step()
  File "train.py", line 72, in run_step
    print(loss_dict)
  File "/home/user/.local/lib/python3.8/site-packages/torch/_tensor.py", line 363, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/user/.local/lib/python3.8/site-packages/torch/autograd/__init__.py", line 173, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.HalfTensor [16, 512, 128, 128]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

Hi, could you locate the line that contains the issue?

I didn't encounter this problem, I'm guessing it might be because of PyTorch versions?