szq0214/DSOD

What is your trick to overcome the GPU memory constraints?

gd2016229035 opened this issue · 5 comments

I can train with only 6 batch size on my single TITAN X (Pascal) without "out of memory". So what is your trick to overcome the GPU memory constraints in the paper?
Thank you~~

Hi @gd2016229035, 'accum_batch_size' is the actual batch size and you can set it to a relatively large value. We use the trick by accumulating gradients over two training iterations, which has been implemented in Caffe. Many other methods like SSD and faster-rcnn also use this trick.

I wonder how the batch size impact the accuracy. If I finetune from pretrained model(like model trained from coco) on VOC dataset, can I decrease the batch size? I am not experiment it since it trains slow.

Hi @wangxiaoyaner,
We use the same batch size (128) and stepvalue (20000, 40000, ...) but much smaller initial lr (0.001) when finetuning from the coco model. I think small batch size is also ok if you use the pretrained model but I have not tested it yet.

Hi,@szq0214 ,Thank you for your reply! I tried to train faster-rcnn (Matlab version,VGG) from scratch and failed to converge successfully ,just as you said~. To train SSD 300(VGG16) from scratch, I wonder what is your "accum_batch_size" and "max_iter " to get 69.6% result?

Hi @gd2016229035, We adopted accum_batch_size=128, initial lr=0.001 and stepvalue=[80000, 100000, 120000, 140000] for training SSD300 (VGGNet backbone) from scratch.

Update:
The results of SSD and SSD (dense) from scratch in our paper are obtained using accum_batch_size 64. If adopting 128, you will achieve better accuracy. We will include these results in our revision.