uzh-rpg/svit

GPU out of memory problem

Closed this issue · 8 comments

Awesome work !!!
I encounter the problem of gpu out of memory, the gpu I used is single RTX4090 (24gb ram), is there any method to reduce gpu memory usage? e.g., batch size (cause I can't find the settings of batch size and I don't know the default settings of batch size)

The command I used: python train.py configs/mask_rcnn/svit-adapter-t-0.5x-ftune.py --gpus 1

Thanks !

@King4819 hello,
Thank you for your interest in our work. The batch size per GPU is defined in the config file in data: samples_per_gpu. For example, it's 4 in configs/mask_rcnn/svit-adapter-t-0.5x-ftune.py. During training, we used 4 GPUs to train the tiny model, so the effective batch size is 16: sh dist_train.sh configs/mask_rcnn/svit-adapter-t-0.5x-ftune.py 4.

For the small model, we set samples_per_gpu=2 with 8 GPUs to keep the effective batch size still 16. Maybe you could try to use smaller samples_per_gpu with more GPUs?

Best,
Yifei

Sorry, I want to ask that is it possible to run the whole work in one single RTX4090 GPU?
And which settings should I made to fit on single gpu
Thanks!

In short the answer is no. The ViT-Adapter is trained with default batch_size=16 (batch_per_gpu x num_gpus), and performance will greatly hurt if a much smaller batch_size is used. Sorry about that.

In case performance is not the focus, you can decrease samples_per_gpu to 2 or 1 and train with single GPU by: python train.py configs/mask_rcnn/svit-adapter-t-0.5x-ftune.py --gpus 1

@kaikai23 Thanks for you reply!
Do you think it is possible to add gradient accumulation code to simulate large batch size on your work ?
Thanks!

Yes, I think so. Here are some reference here1 and here2

@kaikai23 Sorry for the questions. I haved studied the reference, but I still confused about how to modified the code (e.g., where to add accumulative_counts ?) Is it convenient for you to commit a new code that has gradient accumulation ?
Thanks you very much !

@kaikai23 I want to ask that did you experiment the performance of different batch size during finetuning the pruned model, is it necessary to use batch size of 16 ? In other words, if I use batch size of 2 or 1, will the final performance obviously degrades ?
Thanks a lot!

It seems like mmdetection has built in gradient accumulation function that can use. Thanks !