LUMIA-Group/rasat

Out of memory with default configs/train.json on 4*24GB GPU

shenyang0111ucf opened this issue · 9 comments

Hi @JiexingQi I found you asked similar question here: ServiceNow#29. I tried to train t5-3b to use CUDA_VISIBLE_DEVICES="0,1,2,3" python3 -m torch.distributed.launch --nnodes=1 --nproc_per_node=4 seq2seq/run_seq2seq.py configs/train.json with even config like this:
"per_device_train_batch_size": 1,
"per_device_eval_batch_size": 1,
"gradient_accumulation_steps": 1,
"gradient_checkpointing": true,
But I still got out of memory error and all four GPUs' memory are used up (about 22GB used for each of the GPU)
I think you must have some similar experience when using picard code. Could you show me how you solve this annoying out of memory problem? Thank you!

Hi, @shenyang0111ucf which type of GPU do you use?

Hi, @shenyang0111ucf which type of GPU do you use?

@JiexingQi 4 NVIDIA TITAN RTX 24GB card.

24 GB memory GPU seems not able to train the T5-3B model, we use 40 GB A100 to train it (the same as PICARD). By the way, the evaluation can be run at 24 GB 3090 GPU.

24 GB memory GPU seems not able to train the T5-3B model, we use 40 GB A100 to train it (the same as PICARD). By the way, the evaluation can be run at 24 GB 3090 GPU.

I tried to use four 24GB graphic cards instead of one 40 GB A100 to train the model, do you have any experience with "torch.distributed.launch" to make it happen?

may be you could try model parallel in this situation, but I did not have a try.

may be you could try model parallel in this situation, but I did not have a try.

Ok, I will try to find out how to fix this problem by model parallel. Thank you for your time!

You are welcome!

Excuse me, regarding the question raised by @shenyang0111ucf , I would like to ask if the T5-3B can run on 4 NVIDIA GeForce RTX 3090 graphics cards. Each graphics card also has 24GB. Thank you. @JiexingQi

24 GB memory GPU seems not able to train the T5-3B model, we use 40 GB A100 to train it (the same as PICARD). By the way, the evaluation can be run at 24 GB 3090 GPU.

I think it is not enough for training but worked for evaluation.