facebookresearch/DPR

Question: Is there a way to reduce GPU's ram consumption?

ali-abz opened this issue · 2 comments

Hello there.
I am trying to train a model on a single GPU system (RTX 2080 12G) and I have some problems regarding running out of cuda memory with batch sizes greater that 3. For the purpose of my Master's thesis, I need to train the model on a much larger batch sizes like 8 or 12 (with BERT as the core model) but since I only have access to 1 GPU, I cannot go beyond 3, or there will be OOM errors.

I was wondering if I could make any changes to DPR architecture to reduce the memory usage. As far as I understood, DPR is designed to run on multi-GPU setups. Can I expect reduced memory usage if I remove this multi-GPU support or is it irrelevant? Is there anything I can do achieve larger batch sizes other than buying more GPUs?
I would appreciate any tips on this matter.
With best regards,
Ali.

Hi @ali-abz ,
DPR codebase doesn't have gpu ram optimizations. There are however some follow-up works that addressed this issue.
Please have a look at this repo & paper:
https://github.com/luyug/GradCache
https://arxiv.org/pdf/2101.06983.pdf

thanks. I will check them out.