GPU out of memory problem
King4819 opened this issue · 8 comments
Awesome work !!!
I encounter the problem of gpu out of memory, the gpu I used is single RTX4090 (24gb ram), is there any method to reduce gpu memory usage? e.g., batch size (cause I can't find the settings of batch size and I don't know the default settings of batch size)
The command I used: python train.py configs/mask_rcnn/svit-adapter-t-0.5x-ftune.py --gpus 1
Thanks !
@King4819 hello,
Thank you for your interest in our work. The batch size per GPU is defined in the config file in data
: samples_per_gpu
. For example, it's 4 in configs/mask_rcnn/svit-adapter-t-0.5x-ftune.py
. During training, we used 4 GPUs to train the tiny model, so the effective batch size is 16: sh dist_train.sh configs/mask_rcnn/svit-adapter-t-0.5x-ftune.py 4
.
For the small model, we set samples_per_gpu
=2 with 8 GPUs to keep the effective batch size still 16. Maybe you could try to use smaller samples_per_gpu
with more GPUs?
Best,
Yifei
Sorry, I want to ask that is it possible to run the whole work in one single RTX4090 GPU?
And which settings should I made to fit on single gpu
Thanks!
In short the answer is no. The ViT-Adapter is trained with default batch_size=16 (batch_per_gpu x num_gpus), and performance will greatly hurt if a much smaller batch_size is used. Sorry about that.
In case performance is not the focus, you can decrease samples_per_gpu
to 2 or 1 and train with single GPU by: python train.py configs/mask_rcnn/svit-adapter-t-0.5x-ftune.py --gpus 1
@kaikai23 Thanks for you reply!
Do you think it is possible to add gradient accumulation code to simulate large batch size on your work ?
Thanks!
@kaikai23 Sorry for the questions. I haved studied the reference, but I still confused about how to modified the code (e.g., where to add accumulative_counts ?) Is it convenient for you to commit a new code that has gradient accumulation ?
Thanks you very much !
@kaikai23 I want to ask that did you experiment the performance of different batch size during finetuning the pruned model, is it necessary to use batch size of 16 ? In other words, if I use batch size of 2 or 1, will the final performance obviously degrades ?
Thanks a lot!
It seems like mmdetection has built in gradient accumulation function that can use. Thanks !