Out Of Memory Issue LISA
harry7171 opened this issue · 4 comments
Out Of Memory Issue in LISA
Hi ,
I have been trying to use LISA for finetuning for my specific domain data.
Although I am not using LMFLOW instead using the DynamicLayerActivationCallback Class and then using it in HF trainer .
I have a 80GB A100 , and I am finetuning on the same , using Mistral 7b - FP32 bit which occupies 29GB of memory.
But when I do trainer.train() it ramps up the GPU and gives OOM.
Below is the error -
OutOfMemoryError: CUDA out of memory. Tried to allocate 490.00 MiB. GPU 0 has a total capacty of 79.14 GiB of which 461.88 MiB is free. Including non-PyTorch memory, this process has 78.59 GiB memory in use. Of the allocated memory 76.86 GiB is allocated by PyTorch, and 1.28 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
while going through traceback I didnt find any such clue, it threw at forward call in softmax function.
Trying to figure it out since few days, Please assist .
Thanks
Thanks for your interest in LMFlow! Loading the model alone requires 7B * 4 byte/param = 28GB memory, while full parameter training requires 7B * (4+12) = 144 GB memory. Using one's own DynamicLayerActivationCallback
may still require the same kind of memory consumption if the optimizer is not reinitialized every time.
To enable training of such models, you may use LMFlow's implementation, or change float type to bf16
, which should not affect the performance much. Please feel free to let us know if you encounter further problems regarding this issues. Hope this information can be helpful 😄
Hi , Thanks for your quick response @research4pan
yea you are right. i tried using bf16. and it did finetuned but i was using only 1 layer for LISA .
while using 2 layers I was facing same issue , it did start training but it got OOM after 32 steps itself.
I am a bit confused about it. please let me know if I am missing something or going wrong.
We recommend using DynamicLayerActivationCallback
together with paged_adamw
, which allows occasional OOM to be well handled. Hope that can be helpful 😄
Thanks @research4pan , i tried using bf16 for finetuning, its working well enough , will try paged_adamw and try that too
Thanks alot for your help