How to reduce the GPU memory needs?
RaphaelRoyerRivard opened this issue · 3 comments
During the first epoch, I get the following out of memory error
Traceback (most recent call last):
File "train_video_cycle_simple.py", line 352, in <module>
main()
File "train_video_cycle_simple.py", line 232, in main
train_loss, theta_loss, theta_skip_loss = train(train_loader, model, criterion, optimizer, epoch, use_cuda, args)
File "train_video_cycle_simple.py", line 290, in train
outputs = model(imgs, patch2, img, theta)
File "C:\Logiciels\Anaconda3\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "C:\Logiciels\Anaconda3\envs\torch\lib\site-packages\torch\nn\parallel\data_parallel.py", line 150, in forward
return self.module(*inputs[0], **kwargs[0])
File "C:\Logiciels\Anaconda3\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "C:\Users\root\Projects\TimeCycle\models\videos\model_simple.py", line 203, in forward
r50_feat1, r50_feat1_pre, r50_feat1_norm = self.forward_base(videoclip1)
File "C:\Users\root\Projects\TimeCycle\models\videos\model_simple.py", line 164, in forward_base
x_pre = self.encoderVideo(x)
File "C:\Logiciels\Anaconda3\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "C:\Users\root\Projects\TimeCycle\models\videos\inflated_resnet.py", line 35, in forward
x = self.layer1(x)
File "C:\Logiciels\Anaconda3\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "C:\Logiciels\Anaconda3\envs\torch\lib\site-packages\torch\nn\modules\container.py", line 92, in forward
input = module(input)
File "C:\Logiciels\Anaconda3\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "C:\Users\root\Projects\TimeCycle\models\videos\inflated_resnet.py", line 95, in forward
out = self.conv3(out)
File "C:\Logiciels\Anaconda3\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "C:\Logiciels\Anaconda3\envs\torch\lib\site-packages\torch\nn\modules\conv.py", line 476, in forward
self.padding, self.dilation, self.groups)
RuntimeError: CUDA out of memory. Tried to allocate 508.00 MiB (GPU 0; 8.00 GiB total capacity; 5.63 GiB already allocated; 362.97 MiB free; 41.09 MiB cached)
> c:\logiciels\anaconda3\envs\torch\lib\site-packages\torch\nn\modules\conv.py(476)forward()
-> self.padding, self.dilation, self.groups)
(Pdb)
The settings used are the default ones
4
batchSize: 36
temperature: 0.04419417382415922
gridSize: 9
classNum: 49
videoLen: 4
0,1,2,3
False
self.T: 0.04419417382415922
Total params: 26.01M
weight_decay: 0.0
beta1: 0.5
What do I need to change to reduce the needs of GPU memory?
I have the similar problem.
In my case, it can work cropsize 320x320 but other cropsize (400x400 or 480x480) can't work.
My imgSize
is already small (256). Still, I reduced cropSize
from 240 to 128 and got the same error. Then I also changed the batchSize
from 32 to 16 and my out of memory error got replaced by an invalid shape error...
File "train_video_cycle_simple.py", line 352, in <module>
main()
File "train_video_cycle_simple.py", line 232, in main
train_loss, theta_loss, theta_skip_loss = train(train_loader, model, criterion, optimizer, epoch, use_cuda, args)
File "train_video_cycle_simple.py", line 290, in train
outputs = model(imgs, patch2, img, theta)
File "C:\Logiciels\Anaconda3\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "C:\Logiciels\Anaconda3\envs\torch\lib\site-packages\torch\nn\parallel\data_parallel.py", line 150, in forward
return self.module(*inputs[0], **kwargs[0])
File "C:\Logiciels\Anaconda3\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "C:\Users\root\Projects\TimeCycle\models\videos\model_simple.py", line 218, in forward
corrfeat1, corrfeat_trans_matrix2, corrfeat_trans1, trans_out2 = self.compute_transform_img_to_patch(patch_feat2_norm, r50_feat1_norm, temporal_out=self.temporal_out)
File "C:\Users\root\Projects\TimeCycle\models\videos\model_simple.py", line 183, in compute_transform_img_to_patch
corrfeat = self.compute_corr_softmax(query, base, detach_corrfeat=detach_corrfeat)
File "C:\Users\root\Projects\TimeCycle\models\videos\model_simple.py", line 109, in compute_corr_softmax
corrfeat = corrfeat.view(corrfeat.size(0), T, self.spatial_out1 * self.spatial_out1, self.spatial_out2, self.spatial_out2)
RuntimeError: shape '[16, 4, 900, 10, 10]' is invalid for input of size 1638400
> c:\users\root\projects\timecycle\models\videos\model_simple.py(109)compute_corr_softmax()
-> corrfeat = corrfeat.view(corrfeat.size(0), T, self.spatial_out1 * self.spatial_out1, self.spatial_out2, self.spatial_out2)
I really don't know what are this shape and that enormous input size... Does somebody have a clue on how to fix this?
The cropSize
cannot be changed from 240 without causing shape errors, so I needed to lower the batchSize
to 4 and now my video card can take a single batch into its 8Gb of RAM... But a single epoch will take more than 6 hours... This is ridiculous