RuntimeError: CUDA error: out of memory on FCN8 VGG

Question

RuntimeError: CUDA error: out of memory on FCN8 VGG

kvananth opened this issue 7 years ago · 6 comments

Hey - thanks for the implementation. It's very helpful.

I get RuntimeError: CUDA error: out of memory when training FCN8s VGG16 with batch_size=1 running on a 1080i 12GB GPU.

Memory consumption fluctuates a lot from 4Gig to 11.5Gig during training. I wonder why this happens.

I'd greatly appreciate if you could point me to things I can play with to get it under control.

Thank you.

Answer 1 · 2018-03-09T07:11:07.000Z

Adding garbage collector helped a bit.

Answer 2 · 2018-05-22T17:22:56.000Z

How did you solve this issue? Thx

Answer 3 · 2019-02-18T22:01:03.000Z

Hi, i'm having some troubles to train in gpu because of cuda error:out of memory.. if you have some tips to avoid it it would be great

Answer 4 · 2019-04-06T08:53:34.000Z

I also meet the problem.How did you handle out it ?

Answer 5 · 2019-04-06T14:35:11.000Z

Hi everyone!
@lemon1210

First of all, I'm sorry for my poor English. If you don't understand something just let me know and I'll try to explain.

I had meet some workarounds that helped me:

1) First: I used pinned memory in the DataLoader (it might not work at all, I understand that de pin memory works on CPU) and Asincronous copy to gpu. Last I use torch.cuda.empty_cache() in every iteration.
I add a little code I'm actually using

iterString 	= "===> Epoch[{}]({}/{}): Loss: {:.4f}"
epochString 	= "===> Epoch {} Complete: Avg. Loss: {:.4f} tiempo: {:.1f}"

inFormatAndtoDev = "[:,:,:desiredSize[0] ,:desiredSize[1]].contiguous()."+\
						"float().to(device, non_blocking=True)"

gtFormatAndtoDev = "[:,  :desiredSize[0] ,:desiredSize[1]].contiguous()."+\
						"float().to(device, non_blocking=True)"

_inShort = "batch['image']"
_gtShort = "batch['labl' ]"


from time import time

s1=0
numIterations = 50
LOSS = []
def train(epoch):
	s1 			= time()
	epoch_loss 	= 0
	for iteration in range(numIterations):
		try :
			batch = iterator.next()
		except:
			iterator	 = GetIterator(dtset,batch_size,	3)
			batch 	 = iterator.next()
		
		inpt 		= eval( _inShort + inFormatAndtoDev) 
		target 	= eval( _gtShort + gtFormatAndtoDev)
		pred 	= model(inpt)
		try:
			loss	= criterion(pred , target.float())
		except:
			loss	= criterion(pred , target.long())
		LOSS.append(loss.detach().cpu())
		epoch_loss 	+=  loss.item()
		loss.backward()
		optimizer.zero_grad()
		optimizer.step()
			
		print(iterString.format(epoch, 
								 iteration, 
								 numIterations, 
								 loss.item()))
		
		torch.cuda.empty_cache()
	print(epochString.format(	epoch, 
								epoch_loss / numIterations,
								time()-s1))

2) The other work around (I'm not using it now) was putting input, output and model in an object.
This avoids GPU to allocate input, output and model gradients 2 times when you make a new pass.

Last of all, I used, seg_net model with number classes = 20, batch_size= 1. and imageSize =(720, 960, 3)
The training runs in a GTX 980 4GB

And for evaluation I recomend deactivate grads

for pars in model.parameters():
	pars.requires_grad=False

Hope this helps!

Answer 6 · 2019-04-06T14:45:11.000Z

@lemon1210 Hope this help! lemme know!