Out of memory

Question

Out of memory

Closed this issue 7 years ago · 3 comments

Hi, thanks for the job,first.
I got an out of memory error, I have a GTX1070ti GPU and it has a 8GB memory, So, how many memory does this app need?

Answer 1 · 2018-03-09T02:26:24.000Z

I setup my environment as your guides in readme, when I run:
$ ./eval.py -i configs/srresnet.json resources/pretrained/srresnet.pth path/to/image.jpg
AND
$./eval.py -i configs/srgan.json resources/pretrained/srgan.pth path/to/image.jpg
they both run out of memory, here is the error output:

Running on GPU 0
Restored checkpoint from resources/pretrained/srgan.pth
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
File "eval.py", line 157, in
main(sys.argv[1:])
File "eval.py", line 120, in main
data = runner.infer(loader)
File "/home/jhd/face_recognition/softwares/srgan/training/base_runner.py", line 128, in infer
_, data = self._val_step(loader, compute_metrics=False)
File "/home/jhd/face_recognition/softwares/srgan/training/adversarial_runner.py", line 294, in _val_step
prediction = self.gen(inp)
File "/home/jhd/face_recognition/anaconda3/envs/srgan/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in call
result = self.forward(*input, **kwargs)
File "/home/jhd/face_recognition/softwares/srgan/models/srresnet.py", line 195, in forward
x = self.upsample(x + initial)
File "/home/jhd/face_recognition/anaconda3/envs/srgan/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in call
result = self.forward(*input, **kwargs)
File "/home/jhd/face_recognition/anaconda3/envs/srgan/lib/python3.6/site-packages/torch/nn/modules/container.py", line 67, in forward
input = module(input)
File "/home/jhd/face_recognition/anaconda3/envs/srgan/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in call
result = self.forward(*input, **kwargs)
File "/home/jhd/face_recognition/anaconda3/envs/srgan/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 277, in forward
self.padding, self.dilation, self.groups)
File "/home/jhd/face_recognition/anaconda3/envs/srgan/lib/python3.6/site-packages/torch/nn/functional.py", line 90, in conv2d
return f(input, weight, bias)
RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/generic/THCStorage.cu:58

Answer 2 · 2018-03-09T21:44:03.000Z

I guess you used an image which is too large.

This network is pretty memory intensive. For upscaling a 512x512 image, I saw a peak memory usage of 12GB.

I don't know what exactly Pytorch needs to hold in memory. I guess a minimum bound of memory the network needs is 2 * 16 * width * height * 256 * 4 bytes, which is based on the output size of the largest feature map the network computes (and which it needs to hold twice, as input and output). In addition, the network parameters must be hold in memory, which is an additional couple hundred megabytes.

If you have enough RAM, you can try to upscale on your CPU using the -c '' switch. This will, of course, take longer than on GPU. Another option would be to crop the image into parts and upscale each part individually.

Answer 3 · 2018-03-11T06:09:35.000Z

Thanks for reply, it works when I use a smaller picture! Thank you very much.