Python runtime error when training on an image.

Question

Python runtime error when training on an image.

Opened this issue 4 months ago · 1 comments

I have downloaded the source code and installed the prerequisites. When I try to train on an image, I get the following error:

##############################################################
python main_train.py --input_name micro.TIF
Random Seed: 5104
GeneratorConcatSkip2CleanAdd(
(head): ConvBlock(
(conv): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1))
(norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
)
(body): Sequential(
(block1): ConvBlock(
(conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
(norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
)
(block2): ConvBlock(
(conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
(norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
)
(block3): ConvBlock(
(conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
(norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
)
)
(tail): Sequential(
(0): Conv2d(32, 3, kernel_size=(3, 3), stride=(1, 1))
(1): Tanh()
)
)
WDiscriminator(
(head): ConvBlock(
(conv): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1))
(norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
)
(body): Sequential(
(block1): ConvBlock(
(conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
(norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
)
(block2): ConvBlock(
(conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
(norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
)
(block3): ConvBlock(
(conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
(norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
)
)
(tail): Conv2d(32, 1, kernel_size=(3, 3), stride=(1, 1))
)
Traceback (most recent call last):
File "/home/nunyo/Downloads/SinGAN-master/main_train.py", line 29, in
train(opt, Gs, Zs, reals, NoiseAmp)
File "/home/nunyo/Downloads/SinGAN-master/SinGAN/training.py", line 39, in train
z_curr,in_s,G_curr = train_single_scale(D_curr,G_curr,reals,Gs,Zs,in_s,NoiseAmp,opt)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/nunyo/Downloads/SinGAN-master/SinGAN/training.py", line 178, in train_single_scale
errG.backward(retain_graph=True)
File "/home/nunyo/.local/lib/python3.12/site-packages/torch/_tensor.py", line 525, in backward
torch.autograd.backward(
File "/home/nunyo/.local/lib/python3.12/site-packages/torch/autograd/init.py", line 267, in backward
_engine_run_backward(
File "/home/nunyo/.local/lib/python3.12/site-packages/torch/autograd/graph.py", line 744, in _engine_run_backward
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [3, 32, 3, 3]] is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

###############################################################################

I'm not sure what having torch.cuda.FloatTensor at version 2 means, and I'm not sure what to do about it. Any help would be appreciated!

Thanks,

billo

Answer 1 · 2024-10-14T15:24:39.000Z

I have the same problem!