abacaj/fine-tune-mistral

RTX 3090 out of memory

DietmarGrabowski opened this issue · 2 comments

Hi, i am running out of memory:
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 732.00 MiB (GPU 0; 24.00 GiB total capacity; 20.62 GiB already allocated; 0 bytes free; 22.68 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.

abacaj commented

Hi sounds like you are running on 1 GPU, you should probably look into qlora because this setup requires multiple GPUs or enough vram

yes, i missed this. Thank you