GPU out of memory problem

Question

GPU out of memory problem

King4819 opened this issue a year ago · 8 comments

Awesome work !!!
I encounter the problem of gpu out of memory, the gpu I used is single RTX4090 (24gb ram), is there any method to reduce gpu memory usage? e.g., batch size (cause I can't find the settings of batch size and I don't know the default settings of batch size)

The command I used: python train.py configs/mask_rcnn/svit-adapter-t-0.5x-ftune.py --gpus 1

Thanks !

Answer 1 · 2024-02-23T13:52:40.000Z

@King4819 hello,
Thank you for your interest in our work. The batch size per GPU is defined in the config file in data: samples_per_gpu. For example, it's 4 in configs/mask_rcnn/svit-adapter-t-0.5x-ftune.py. During training, we used 4 GPUs to train the tiny model, so the effective batch size is 16: sh dist_train.sh configs/mask_rcnn/svit-adapter-t-0.5x-ftune.py 4.

For the small model, we set samples_per_gpu=2 with 8 GPUs to keep the effective batch size still 16. Maybe you could try to use smaller samples_per_gpu with more GPUs?

Best,
Yifei

Answer 2 · 2024-02-23T14:00:44.000Z

Sorry, I want to ask that is it possible to run the whole work in one single RTX4090 GPU?
And which settings should I made to fit on single gpu
Thanks!

Answer 3 · 2024-02-23T15:02:45.000Z

In short the answer is no. The ViT-Adapter is trained with default batch_size=16 (batch_per_gpu x num_gpus), and performance will greatly hurt if a much smaller batch_size is used. Sorry about that.

In case performance is not the focus, you can decrease samples_per_gpu to 2 or 1 and train with single GPU by: python train.py configs/mask_rcnn/svit-adapter-t-0.5x-ftune.py --gpus 1

Answer 4 · 2024-02-23T15:09:58.000Z

@kaikai23 Thanks for you reply!
Do you think it is possible to add gradient accumulation code to simulate large batch size on your work ?
Thanks!

Answer 5 · 2024-02-23T18:54:53.000Z

Yes, I think so. Here are some reference here1 and here2

Answer 6 · 2024-02-24T01:50:26.000Z

@kaikai23 Sorry for the questions. I haved studied the reference, but I still confused about how to modified the code (e.g., where to add accumulative_counts ?) Is it convenient for you to commit a new code that has gradient accumulation ?
Thanks you very much !

Answer 7 · 2024-02-27T06:16:43.000Z

@kaikai23 I want to ask that did you experiment the performance of different batch size during finetuning the pruned model, is it necessary to use batch size of 16 ? In other words, if I use batch size of 2 or 1, will the final performance obviously degrades ?
Thanks a lot!

Answer 8 · 2024-03-04T06:16:51.000Z

It seems like mmdetection has built in gradient accumulation function that can use. Thanks !