LincanLi98/STG-Mamba

CUDA out of memory

Opened this issue · 6 comments

Impressive work!

While running the training code, I encountered a CUDA out of memory error.

Could you please advise on any settings that could reduce the memory requirements?

You can reduce the batch_size, default is 48.

I'm having the same issue, where the GPU memory keeps increasing until it exceeds it while training.

me too. During the training process, the GPU memory will become higher and higher, and no matter what batchsize. Is there too much useless information during training that is not cleaned up in time?

Hi, I just run a quick test on a RTX 4090 (24GB) GPU. However, during the whole training session, I didn't encounter the problem you mentioned. The original work was trained on One A100 GPU. During the whole session, the monitored memory consumption is around 27%-37% of the total GPU memory. Here is the hardware which I just have tested on:

Cuda 11.3
GPU: RTX 4090(24GB) * 1
CPU: 12 vCPU Intel(R) Xeon(R) Platinum 8352V CPU @ 2.10GHz
Memory: 90GB

Hope it can helps.
截屏2024-06-04 21 18 21

@flww213 @leiershuai @ohhhh2022 I also see the increasing VRAM during training... do you figure out the reason?

动画1
the cost of graphic processer memory gets higher and higher, and finally my 8gb 4060 laptop stopped at gen 129/199 QAQ
then I tried to use cloud gpu for training, without shared gpu memory,3090 24gb died at gen 135/199, only a tiny progress than my 4060 laptop with 8gb+16gb (shared memory)