OOM Issue (A6000 GPU, Batch Size 8 per GPU)
2minkyulee opened this issue · 0 comments
2minkyulee commented
Thanks for your great work!
I'm having an Out of Memory issue with the following configuration:
(Probably the default training setting and identical VRAM size)
- batch size per GPU = 8
- A6000 GPUs (48G VRAM)
- gt_size = 256
Disabling the CUDA prefetcher didn't help.
A batch size of 7 per GPU works fine, so it seems to be a slight OOM problem.
Gradient checkpointing also worked, but led to slower training. It would be better if we have alternatives.
Are there any configurations that I might have missed to reduce VRAM usage?
Or do you have any other suggestions?