ksugar/samapi

CUDA Out of Memory

ArezooGhodsifard opened this issue · 1 comments

Hello,

I'm encountering a CUDA Out of Memory (OOM) issue while attempting to allocate an additional 768.00 MiB for model inference, despite having a seemingly sufficient amount of free memory on my NVIDIA GeForce RTX 3060 (6 GB total capacity). The exact warning message is as follows:

UserWarning: cuda device found but got the error CUDA out of memory. Tried to allocate 768.00 MiB (GPU 0; 5.80 GiB total capacity; 1.95 GiB already allocated; 356.75 MiB free; 2.00 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF - using CPU for inference

This occurs when I launch my application with Uvicorn, involving model registrations that potentially lead to this memory allocation issue. My environment is set up with CUDA 11.7 and PyTorch compatible with this CUDA version, running in a Conda environment on Ubuntu.

Could you provide insights or suggestions on how to manage or mitigate this OOM issue? Are there recommended practices for memory management or configurations specific to PyTorch that I should consider to optimize GPU memory usage and avoid hitting this limit?

Thank you for your support.

@ArezooGhodsifard Thank you for reporting the issue.
Which model did you use? With 6 GB of VRAM, vit_h and vit_l would be too heavy to run. Could you try vit_b or vit_t and see if it works?