A problem about training.
Closed this issue · 4 comments
Hello, thanks for your great work. However, when I run your train.py file, I have some problems as follows. Could you help me to solve this problem. Thanks!
Traceback (most recent call last):
File "train.py", line 218, in
train()
File "train.py", line 141, in train
d_fake = generator_step(net_d, out2, net_loss, optimizer)
File "/userhome/point-cloud/VRCNet/utils/train_utils.py", line 42, in generator_step
total_gen_loss_batch.backward(torch.ones(torch.cuda.device_count()).cuda(), retain_graph=True, )
File "/opt/conda/lib/python3.6/site-packages/torch/tensor.py", line 120, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/opt/conda/lib/python3.6/site-packages/torch/autograd/init.py", line 93, in backward
grad_tensors = _make_grads(tensors, grad_tensors)
File "/opt/conda/lib/python3.6/site-packages/torch/autograd/init.py", line 29, in _make_grads
+ str(out.shape) + ".")
RuntimeError: Mismatch in shape: grad_output[0] has a shape of torch.Size([1]) and output[0] has a shape of torch.Size([]).
Hi,
I am having the same issues. I use python 3.8, PyTorch 1.8.1 and Cuda 11.
Does anyone have a fix for this?
Thanks
Hi,
I find that the issue may occur when trained with only one gpu, you can try more than two.
Best regards.
Hi,
Thanks for your input, it helped fix the issue. If you only have one GPU, PyTorch outputs the resulting loss as a scalar, not a vector of losses, but the command torch.ones(...) always creates a vector for the gradients.
The edge case can be caught by either not using torch.ones(...) or squeezing the result by changing line 134 in train.py to:
net_loss.backward(torch.squeeze(torch.ones(torch.cuda.device_count())).cuda())
It helps, thanks!