tamarott/SinGAN

RuntimeError: CUDA out of memory

Opened this issue · 8 comments

First of all, thank you for your wonderful work.
I am training animation.py and after scale 7 I am getting this error. How can I solve it? Thanks!

scale 7:[1975/2000]
scale 7:[1999/2000]
GeneratorConcatSkip2CleanAdd(
(head): ConvBlock(
(conv): Conv2d(3, 128, kernel_size=(3, 3), stride=(1, 1))
(norm): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
)
(body): Sequential(
(block1): ConvBlock(
(conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1))
(norm): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
)
(block2): ConvBlock(
(conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1))
(norm): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
)
(block3): ConvBlock(
(conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1))
(norm): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
)
)
(tail): Sequential(
(0): Conv2d(128, 3, kernel_size=(3, 3), stride=(1, 1))
(1): Tanh()
)
)
WDiscriminator(
(head): ConvBlock(
(conv): Conv2d(3, 128, kernel_size=(3, 3), stride=(1, 1))
(norm): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
)
(body): Sequential(
(block1): ConvBlock(
(conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1))
(norm): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
)
(block2): ConvBlock(
(conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1))
(norm): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
)
(block3): ConvBlock(
(conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1))
(norm): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
)
)
(tail): Conv2d(128, 1, kernel_size=(3, 3), stride=(1, 1))
)
Traceback (most recent call last):
File "main_train.py", line 29, in
train(opt, Gs, Zs, reals, NoiseAmp)
File "C:\Users\Wooks\Source\ml_khan_20185057\SinGAN\SinGAN\training.py", line 39, in train
z_curr,in_s,G_curr = train_single_scale(D_curr,G_curr,reals,Gs,Zs,in_s,NoiseAmp,opt)
File "C:\Users\Wooks\Source\ml_khan_20185057\SinGAN\SinGAN\training.py", line 162, in train_single_scale
gradient_penalty.backward()
File "C:\Users\Wooks\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\tensor.py", line 166, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "C:\Users\Wooks\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\autograd_init_.py", line 99, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 22.00 MiB (GPU 0; 2.00 GiB total capacity; 1.14 GiB already allocated; 9.49 MiB free; 177.34 MiB cached)

Hello! I encountered the same problem when running the main_train.py file, but it appeared after adding a layer of attention mechanism to the network of the generator and the discriminator. I did not encounter any problems when running the original code. Does adding only a layer of attention mechanism cause insufficient GPU memory? Thank you and wish you a happy life!

@markstrefford ...ran into a similar issue; have 6 GiB memory - training on a 1024x1024 pixels image...

Hello! I encountered the same problem when running the main_train.py file, but it appeared after adding a layer of attention mechanism to the network of the generator and the discriminator. I did not encounter any problems when running the original code. Does adding only a layer of attention mechanism cause insufficient GPU memory? Thank you and wish you a happy life!

Attention layers consume a lot of memory. You can try using pooling or another mechanism to reduce the attention matrix size to reduce the memory usage

@victorca25 Thank you for your idea, which has benefited me a lot. Wish you a happy life!

I want to know why the memory is increasing when training model on a finer scale. Because parameters of previous model is fixed, so I wonder about the increasing memory.

@ahmadxon : How did you solve the "out of memory" error?

@ahmadxon : How did you solve the "out of memory" error?

I just used Google Colab platform and runed there.