What is your trick to overcome the GPU memory constraints?

Question

What is your trick to overcome the GPU memory constraints?

gd2016229035 opened this issue 7 years ago · 5 comments

gd2016229035 commented 7 years ago

I can train with only 6 batch size on my single TITAN X (Pascal) without "out of memory". So what is your trick to overcome the GPU memory constraints in the paper?
Thank you~~

Answer 1 · 2017-08-13T10:54:02.000Z

Hi @gd2016229035, 'accum_batch_size' is the actual batch size and you can set it to a relatively large value. We use the trick by accumulating gradients over two training iterations, which has been implemented in Caffe. Many other methods like SSD and faster-rcnn also use this trick.

Answer 2 · 2017-08-17T09:20:16.000Z

I wonder how the batch size impact the accuracy. If I finetune from pretrained model(like model trained from coco) on VOC dataset, can I decrease the batch size? I am not experiment it since it trains slow.

Answer 3 · 2017-08-17T16:28:05.000Z

Hi @wangxiaoyaner,
We use the same batch size (128) and stepvalue (20000, 40000, ...) but much smaller initial lr (0.001) when finetuning from the coco model. I think small batch size is also ok if you use the pretrained model but I have not tested it yet.

Answer 4 · 2017-08-31T12:47:21.000Z

Hi,@szq0214 ,Thank you for your reply! I tried to train faster-rcnn (Matlab version,VGG) from scratch and failed to converge successfully ,just as you said~. To train SSD 300(VGG16) from scratch, I wonder what is your "accum_batch_size" and "max_iter " to get 69.6% result?

Answer 5 · 2017-08-31T17:08:39.000Z

Hi @gd2016229035, We adopted accum_batch_size=128, initial lr=0.001 and stepvalue=[80000, 100000, 120000, 140000] for training SSD300 (VGGNet backbone) from scratch.

Update:
The results of SSD and SSD (dense) from scratch in our paper are obtained using accum_batch_size 64. If adopting 128, you will achieve better accuracy. We will include these results in our revision.