RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

Question

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

Closed this issue 3 years ago · 3 comments

Hi, I'm trying to train your model on the NYUv2 model,
but it seems that backpropagation grpah is broken at somewhere in your training code.

I've already tried to use 'inplace=False' option in ReLU function, but it doesn't work.
I used 'anomaly' test to find the errornous code and follwoing is the results:

../torch/csrc/autograd/python_anomaly_mode.cpp:57: UserWarning: Traceback of forward call that caused the error:
  File "main.py", line 364, in <module>
    network.train(starting_epoch=args.epoch)
  File "/workspace/other_algs/nyu/vi_depth_completion/network_run.py", line 334, in train
    self._run_training_iteration(sample_batched, epoch, max_epochs, i, max_iters)
  File "/workspace/other_algs/nyu/vi_depth_completion/network_run.py", line 237, in _run_training_iteration
    cnn_outputs = self._call_cnn(sample_batched)
  File "main.py", line 283, in _call_cnn
    ds[i, 0, ...], homo[i, ...])
  File "main.py", line 151, in extract_plane_images_from_normal_image
    normals = normal_image[mask_for_class]

I think indexing at "normals = normal_image[mask_for_class]" causes some problem.
Do you have any experience this kind of errors?

Sincerely,
Jinwoo Jeon

Answer 1 · 2022-01-21T18:50:54.000Z

Hi Jinwoo. I'm not sure this is the full error message and the full message should contain more information, but if the issue is with backpropagation then if I remember correctly this area of the code (constructing the planes and the incomplete depth) should not produce gradients, it just produces an augmented depth to be the new input. So could you please try putting the area before actually calling the cnn inside a "with torch.no_grad():" block?

Answer 2 · 2022-01-24T09:51:46.000Z

Thank you for your kind answer.
I agree that the indexing should not affect the backpropagation.

Anyway, I've solved this issue by replacing

mask_for_class = mask == cls
normals = normal_image[mask == cls]

by

normal_image[mask == cls]

I think your solution will work eigher.
Thank you again for your answer.
Best regards,
Jinwoo Jeon

Answer 3 · 2022-02-01T07:23:16.000Z

Glad it work. If you want to use this for training though, I'd recommend generating the incomplete depth first, and then train on the recorded data. This will be much faster and is actually how we originally trained the network (also given the error above I don't know if pytorch is computing gradients for the plane detection, etc. which we don't actually want).