grishavak/LIDIA-denoiser

CUDA out of memory error.

joaocps opened this issue · 2 comments

Congratulations on the excellent work! I was trying to test it with an rgb image but I just can't due to lack of memory, any suggestions?

Thank you very much!

Stacktrace:

Traceback (most recent call last):
  File "denoise_rgb.py", line 90, in <module>
    denoise_bw_func()
  File "denoise_rgb.py", line 63, in denoise_bw_func
    test_image_dn = process_image(nl_denoiser, test_image_n.to(device), opt.max_chunk)
  File "C:\Users\jcps\Desktop\AIMAGE-TEST\LIDIA-denoiser-master\LIDIA-denoiser-master\code\utils.py", line 77, in process_image
    image_dn = nl_denoiser(image_n, train=False, save_memory=True, max_chunk=max_chunk)
  File "C:\Users\jcps\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\jcps\Desktop\AIMAGE-TEST\LIDIA-denoiser-master\LIDIA-denoiser-master\code\modules.py", line 459, in forward
    image_dn = self.denoise_image(image_n, train, save_memory, max_chunk)
  File "C:\Users\jcps\Desktop\AIMAGE-TEST\LIDIA-denoiser-master\LIDIA-denoiser-master\code\modules.py", line 344, in denoise_image
    top_dist0, top_ind0 = self.find_nn(image_for_nn0, im_params0, self.patch_w)
  File "C:\Users\jcps\Desktop\AIMAGE-TEST\LIDIA-denoiser-master\LIDIA-denoiser-master\code\modules.py", line 289, in find_nn
    top_dist = torch.zeros(im_params['batches'], im_params['patches_h'],

RuntimeError: CUDA out of memory. Tried to allocate 70.87 GiB (GPU 0; 4.00 GiB total capacity; 868.09 MiB already allocated; 1.84 GiB free; 1.02 GiB reserved in total by PyTorch)

NVIDIA GEFORCE GTX 960M 4GB

@grishavak

Thank you for showing interest in my work, and sorry for the late reply! Unfortunately, 4GB is not enough GPU memory for running this code. I suggest you run on CPU or use NVIDIA 1080ti.

This code applies separable matrix multiplications (W1XW2, where X is input, and W1, W2 are trainable matrices). It is an unusual operation for neural networks. Thus, I guess it is implemented non-efficiently in PyTorch or Cuda libraries.

@joaocps