RuntimeError: CUDA error: out of memory on FCN8 VGG
kvananth opened this issue · 6 comments
Hey - thanks for the implementation. It's very helpful.
I get RuntimeError: CUDA error: out of memory when training FCN8s VGG16 with batch_size=1 running on a 1080i 12GB GPU.
Memory consumption fluctuates a lot from 4Gig to 11.5Gig during training. I wonder why this happens.
I'd greatly appreciate if you could point me to things I can play with to get it under control.
Thank you.
Adding garbage collector helped a bit.
How did you solve this issue? Thx
Hi, i'm having some troubles to train in gpu because of cuda error:out of memory.. if you have some tips to avoid it it would be great
I also meet the problem.How did you handle out it ?
Hi everyone!
@lemon1210
First of all, I'm sorry for my poor English. If you don't understand something just let me know and I'll try to explain.
I had meet some workarounds that helped me:
1) First: I used pinned memory in the DataLoader (it might not work at all, I understand that de pin memory works on CPU) and Asincronous copy to gpu. Last I use torch.cuda.empty_cache()
in every iteration.
I add a little code I'm actually using
iterString = "===> Epoch[{}]({}/{}): Loss: {:.4f}"
epochString = "===> Epoch {} Complete: Avg. Loss: {:.4f} tiempo: {:.1f}"
inFormatAndtoDev = "[:,:,:desiredSize[0] ,:desiredSize[1]].contiguous()."+\
"float().to(device, non_blocking=True)"
gtFormatAndtoDev = "[:, :desiredSize[0] ,:desiredSize[1]].contiguous()."+\
"float().to(device, non_blocking=True)"
_inShort = "batch['image']"
_gtShort = "batch['labl' ]"
from time import time
s1=0
numIterations = 50
LOSS = []
def train(epoch):
s1 = time()
epoch_loss = 0
for iteration in range(numIterations):
try :
batch = iterator.next()
except:
iterator = GetIterator(dtset,batch_size, 3)
batch = iterator.next()
inpt = eval( _inShort + inFormatAndtoDev)
target = eval( _gtShort + gtFormatAndtoDev)
pred = model(inpt)
try:
loss = criterion(pred , target.float())
except:
loss = criterion(pred , target.long())
LOSS.append(loss.detach().cpu())
epoch_loss += loss.item()
loss.backward()
optimizer.zero_grad()
optimizer.step()
print(iterString.format(epoch,
iteration,
numIterations,
loss.item()))
torch.cuda.empty_cache()
print(epochString.format( epoch,
epoch_loss / numIterations,
time()-s1))
2) The other work around (I'm not using it now) was putting input, output and model in an object.
This avoids GPU to allocate input, output and model gradients 2 times when you make a new pass.
Last of all, I used, seg_net model with number classes = 20, batch_size= 1. and imageSize =(720, 960, 3)
The training runs in a GTX 980 4GB
And for evaluation I recomend deactivate grads
for pars in model.parameters():
pars.requires_grad=False
Hope this helps!
@lemon1210 Hope this help! lemme know!