longcw/faster_rcnn_pytorch

out of memory

manyuyuya opened this issue · 2 comments

Hello! When I run the train.py, I met the problem about out of memory after a few epoches. It also happened even if I add the number of GPU. And I found some other people met this question ,too. I don't it's reason. Could you offer some help?Thank you very much!
It's the information about the question below:

step 120, image: 005365.jpg, loss: 6.3531, fps: 3.71 (0.27s per batch)
TP: 0.00%, TF: 100.00%, fg/bg=(14/285)
rpn_cls: 0.6417, rpn_box: 0.0229, rcnn_cls: 1.9303, rcnn_box: 0.1354
step 130, image: 009091.jpg, loss: 4.8151, fps: 3.78 (0.26s per batch)
TP: 0.00%, TF: 100.00%, fg/bg=(22/277)
rpn_cls: 0.6486, rpn_box: 0.2012, rcnn_cls: 1.7988, rcnn_box: 0.1184
step 140, image: 008690.jpg, loss: 4.9961, fps: 3.55 (0.28s per batch)
TP: 0.00%, TF: 100.00%, fg/bg=(30/269)
rpn_cls: 0.6114, rpn_box: 0.0690, rcnn_cls: 1.4801, rcnn_box: 0.1088
THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
File "train.py", line 138, in
loss.backward()
File "/usr/local/lib/python2.7/dist-packages/torch/tensor.py", line 93, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/usr/local/lib/python2.7/dist-packages/torch/autograd/init.py", line 89, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/generic/THCStorage.cu:58

try pytorch version 0.3.1 with cudatoolkit 8.0
I used 0.4.1 version either, but had same error (may be gpu memory leak in code). So I downgraded the version of pytorch.

I think the memory leak due to RoI pooling layer, because when I copy the code of RoI pooling layer to my another project. It also memory leak on GPU.