NVlabs/denoising-diffusion-gan

RuntimeError: CUDA out of memory.

KomputerMaster64 opened this issue · 3 comments

I have tried sampling/evaluating/testing the model on colab as well as local gpu node, however I am facing the CUDA out of memory error.
Error on google colab

Traceback (most recent call last):
  File "test_ddgan.py", line 272, in <module>
    sample_and_test(args)
  File "test_ddgan.py", line 186, in sample_and_test
    fake_sample = sample_from_model(pos_coeff, netG, args.num_timesteps, x_t_1,T,  args)
  File "test_ddgan.py", line 123, in sample_from_model
    x_0 = generator(x, t_time, latent_z)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/content/drive/MyDrive/Repositories/denoising-diffusion-gan/score_sde/models/ncsnpp_generator_adagn.py", line 322, in forward
    h = modules[m_idx](hs[-1], temb, zemb)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/content/drive/MyDrive/Repositories/denoising-diffusion-gan/score_sde/models/layerspp.py", line 300, in forward
    h = self.act(self.GroupNorm_1(h, zemb))
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/content/drive/MyDrive/Repositories/denoising-diffusion-gan/score_sde/models/layerspp.py", line 61, in forward
    out = gamma * out + beta
RuntimeError: CUDA out of memory. Tried to allocate 3.12 GiB (GPU 0; 14.76 GiB total capacity; 12.95 GiB already allocated; 887.75 MiB free; 12.95 GiB reserved in total by PyTorch)

Error on GPU node:

Traceback (most recent call last):
  File "test_ddgan.py", line 272, in <module>
    sample_and_test(args)
  File "test_ddgan.py", line 186, in sample_and_test
    fake_sample = sample_from_model(pos_coeff, netG, args.num_timesteps, x_t_1,T,  args)
  File "test_ddgan.py", line 123, in sample_from_model
    x_0 = generator(x, t_time, latent_z)
  File "/home/manisha.padala/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/manisha.padala/gan/denoising-diffusion-gan/score_sde/models/ncsnpp_generator_adagn.py", line 322, in forward
    h = modules[m_idx](hs[-1], temb, zemb)
  File "/home/manisha.padala/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/manisha.padala/gan/denoising-diffusion-gan/score_sde/models/layerspp.py", line 279, in forward
    h = self.act(self.GroupNorm_0(x, zemb))
  File "/home/manisha.padala/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/manisha.padala/gan/denoising-diffusion-gan/score_sde/models/layerspp.py", line 61, in forward
    out = gamma * out + beta
RuntimeError: CUDA out of memory. Tried to allocate 3.12 GiB (GPU 0; 10.76 GiB total capacity; 6.70 GiB already allocated; 3.06 GiB free; 6.70 GiB reserved in total by PyTorch)

In both the cases the system could't somehow allocate 3.12 GiB

set --batch_size 100 in test_ddgan.py

Thank you for the tip.
I wanted to know how to generate more images (to have unique 100 or 200 images), given that the running the test file prints out the same images for the given batch size

torch.manual_seed(42)

remove the seed