CUDA out of memory error
MichaelGMoore opened this issue · 0 comments
I am running the full app on Ubunto 22.10 with RTX3060 12Gb GPU running CUDA 12.0.
I launch the app with the command:
lightning run app app.py --env NVIDIA_DRIVER_CAPABILITIES=compute,utility,video
I was able to train the "supervised" model, but when attempting to train the "semi-super" model, the app crashes. The terminal shows this just before the crash:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 0; 11.76 GiB total capacity; 9.16 GiB already allocated; 38.31 MiB free; 9.24 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Clearly the error is making a suggestion, but I am not knowledgeable enough to know how to adjust that parameter in practice. Any suggestions?