xiaolonw/TimeCycle

How to reduce the GPU memory needs?

RaphaelRoyerRivard opened this issue · 3 comments

During the first epoch, I get the following out of memory error

Traceback (most recent call last):                                                                                                                            
  File "train_video_cycle_simple.py", line 352, in <module>                                                                                                   
    main()                                                                                                                                                    
  File "train_video_cycle_simple.py", line 232, in main                                                                                                       
    train_loss, theta_loss, theta_skip_loss = train(train_loader, model, criterion, optimizer, epoch, use_cuda, args)                                         
  File "train_video_cycle_simple.py", line 290, in train                                                                                                      
    outputs = model(imgs, patch2, img, theta)                                                                                                                 
  File "C:\Logiciels\Anaconda3\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__                                                
    result = self.forward(*input, **kwargs)                                                                                                                   
  File "C:\Logiciels\Anaconda3\envs\torch\lib\site-packages\torch\nn\parallel\data_parallel.py", line 150, in forward                                         
    return self.module(*inputs[0], **kwargs[0])                                                                                                               
  File "C:\Logiciels\Anaconda3\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__                                                
    result = self.forward(*input, **kwargs)                                                                                                                   
  File "C:\Users\root\Projects\TimeCycle\models\videos\model_simple.py", line 203, in forward                                                                 
    r50_feat1, r50_feat1_pre, r50_feat1_norm = self.forward_base(videoclip1)                                                                                  
  File "C:\Users\root\Projects\TimeCycle\models\videos\model_simple.py", line 164, in forward_base                                                            
    x_pre = self.encoderVideo(x)                                                                                                                              
  File "C:\Logiciels\Anaconda3\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__                                                
    result = self.forward(*input, **kwargs)                                                                                                                   
  File "C:\Users\root\Projects\TimeCycle\models\videos\inflated_resnet.py", line 35, in forward                                                               
    x = self.layer1(x)                                                                                                                                        
  File "C:\Logiciels\Anaconda3\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__                                                
    result = self.forward(*input, **kwargs)                                                                                                                   
  File "C:\Logiciels\Anaconda3\envs\torch\lib\site-packages\torch\nn\modules\container.py", line 92, in forward                                               
    input = module(input)                                                                                                                                     
  File "C:\Logiciels\Anaconda3\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__                                                
    result = self.forward(*input, **kwargs)                                                                                                                   
  File "C:\Users\root\Projects\TimeCycle\models\videos\inflated_resnet.py", line 95, in forward                                                               
    out = self.conv3(out)                                                                                                                                     
  File "C:\Logiciels\Anaconda3\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__                                                
    result = self.forward(*input, **kwargs)                                                                                                                   
  File "C:\Logiciels\Anaconda3\envs\torch\lib\site-packages\torch\nn\modules\conv.py", line 476, in forward                                                   
    self.padding, self.dilation, self.groups)                                                                                                                 
RuntimeError: CUDA out of memory. Tried to allocate 508.00 MiB (GPU 0; 8.00 GiB total capacity; 5.63 GiB already allocated; 362.97 MiB free; 41.09 MiB cached)
> c:\logiciels\anaconda3\envs\torch\lib\site-packages\torch\nn\modules\conv.py(476)forward()                                                                  
-> self.padding, self.dilation, self.groups)                                                                                                                  
(Pdb)                                                                                                                                                         

The settings used are the default ones

4                               
batchSize: 36                   
temperature: 0.04419417382415922
gridSize: 9                     
classNum: 49                    
videoLen: 4                     
0,1,2,3                         
False                           
self.T: 0.04419417382415922     
    Total params: 26.01M        
weight_decay: 0.0               
beta1: 0.5                      

What do I need to change to reduce the needs of GPU memory?

I have the similar problem.
In my case, it can work cropsize 320x320 but other cropsize (400x400 or 480x480) can't work.

My imgSize is already small (256). Still, I reduced cropSize from 240 to 128 and got the same error. Then I also changed the batchSize from 32 to 16 and my out of memory error got replaced by an invalid shape error...

  File "train_video_cycle_simple.py", line 352, in <module>
    main()
  File "train_video_cycle_simple.py", line 232, in main
    train_loss, theta_loss, theta_skip_loss = train(train_loader, model, criterion, optimizer, epoch, use_cuda, args)
  File "train_video_cycle_simple.py", line 290, in train
    outputs = model(imgs, patch2, img, theta)
  File "C:\Logiciels\Anaconda3\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Logiciels\Anaconda3\envs\torch\lib\site-packages\torch\nn\parallel\data_parallel.py", line 150, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "C:\Logiciels\Anaconda3\envs\torch\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\root\Projects\TimeCycle\models\videos\model_simple.py", line 218, in forward
    corrfeat1, corrfeat_trans_matrix2, corrfeat_trans1, trans_out2 = self.compute_transform_img_to_patch(patch_feat2_norm, r50_feat1_norm, temporal_out=self.temporal_out)
  File "C:\Users\root\Projects\TimeCycle\models\videos\model_simple.py", line 183, in compute_transform_img_to_patch
    corrfeat = self.compute_corr_softmax(query, base, detach_corrfeat=detach_corrfeat)
  File "C:\Users\root\Projects\TimeCycle\models\videos\model_simple.py", line 109, in compute_corr_softmax
    corrfeat  = corrfeat.view(corrfeat.size(0), T, self.spatial_out1 * self.spatial_out1, self.spatial_out2, self.spatial_out2)
RuntimeError: shape '[16, 4, 900, 10, 10]' is invalid for input of size 1638400
> c:\users\root\projects\timecycle\models\videos\model_simple.py(109)compute_corr_softmax()
-> corrfeat  = corrfeat.view(corrfeat.size(0), T, self.spatial_out1 * self.spatial_out1, self.spatial_out2, self.spatial_out2)

I really don't know what are this shape and that enormous input size... Does somebody have a clue on how to fix this?

The cropSize cannot be changed from 240 without causing shape errors, so I needed to lower the batchSize to 4 and now my video card can take a single batch into its 8Gb of RAM... But a single epoch will take more than 6 hours... This is ridiculous