16 GB of GPU memory runs out
itsmasabdi opened this issue · 2 comments
itsmasabdi commented
Hi.
I'm trying to train this model on a single P100 with 16 GB memory but seem to be running out of memory with a batch size of 2. Do I need more than 16 GB for this model? How can I reduce the GPU memory usage?
Cheers,
deepanwayx commented
Hey, you can try the following:
- Use a smaller text encoder and a smaller diffusion model if you are training from scratch.
- Use the Adafactor / 8 Bit Adam optimizer. This should reduce memory consumption significantly.
- Use gradient checkpointing from accelerate.
- Use a batch size of 1 without augmentation.
- If memory still runs out then you need to use DeepSpeed ZeRO with CPU Offload.
You can follow this guide: https://huggingface.co/docs/transformers/perf_train_gpu_one
chenxinglili commented
Hi, @deepanwayx
I would like to ask if you have trained tango with deepspeed?
I have encountered some problems. Can you provide some advice?