facebookarchive/fb.resnet.torch

Out of memory

ilichev-andrey opened this issue · 2 comments

Hello.

Error:

=> Training epoch # 1	
THCudaCheck FAIL file=/root/facedetect/torch/extra/cutorch/lib/THC/generic/THCStorage.cu line=65 error=2 : out of memory
/root/facedetect/torch/install/bin/luajit: .../facedetect/torch/install/share/lua/5.1/nn/Container.lua:67: 
In 6 module of nn.Sequential:
In 1 module of nn.Sequential:
In 1 module of nn.Sequential:
In 1 module of nn.ConcatTable:
In 1 module of nn.Sequential:
/root/facedetect/torch/install/share/lua/5.1/nn/THNN.lua:110: cuda runtime error (2) : out of memory at /root/facedetect/torch/extra/cutorch/lib/THC/generic/THCStorage.cu:65

Memory-Usage: 1789MiB / 2002MiB (luajit - 1510MiB)

Why not have enough memory?

My data (49,5 MB):

  • train
  • val

Model (imagenet): resnet-18 (188,4 MB)

batchSize 256


batchSize 32 - all good

Memory-Usage: 984MiB / 2002MiB (luajit - 670MiB)

I don't understand why my data occupies a lot of memory.

q121q commented

Internally torch uses more memory than the sum of all weights + biases in all layers, thus you see this increased memory usage. Lowering you batchSize is usually the way to go.

i reduce my batchSize ,but when the batchSize is too small such as 20, the accuracy is lower. i think it is beacause batch norm . how can i avoid it?