declare-lab/tango

16 GB of GPU memory runs out

itsmasabdi opened this issue · 2 comments

Hi.

I'm trying to train this model on a single P100 with 16 GB memory but seem to be running out of memory with a batch size of 2. Do I need more than 16 GB for this model? How can I reduce the GPU memory usage?

Cheers,

Hey, you can try the following:

  1. Use a smaller text encoder and a smaller diffusion model if you are training from scratch.
  2. Use the Adafactor / 8 Bit Adam optimizer. This should reduce memory consumption significantly.
  3. Use gradient checkpointing from accelerate.
  4. Use a batch size of 1 without augmentation.
  5. If memory still runs out then you need to use DeepSpeed ZeRO with CPU Offload.

You can follow this guide: https://huggingface.co/docs/transformers/perf_train_gpu_one

Hi, @deepanwayx
I would like to ask if you have trained tango with deepspeed?
I have encountered some problems. Can you provide some advice?