Please document settings for PYTORCH memory, how to split across GPUs
Opened this issue · 2 comments
truedat101 commented
Currently for a GPU with smaller vram, it will get OOM exception. I have multiple GPUs and on other systems, I specify "autodevices" or specify GPU numbers to use: CUDA:0,1,2 . Please describe what changes to make in the code, or offer a flag to set these things.
pookiefoof commented
Hi @truedat101, can you post the VRAM OOM error log here?
Besides, run.py
uses cuda:0
by default, and you can specify the GPU with the --device
argument.
truedat101 commented
I will try it. I have a frankestein machine with 4 smaller GPUs that I would like to put to work (total of 32gb). So
Trace from gradio:
return forward_call(*args, **kwargs)
File "/home/dkords/dev/repos/TripoSR/menv/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 116, in forward
return F.linear(input, self.weight, self.bias)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 96.00 MiB. GPU 0 has a total capacity of 7.92 GiB of which 31.62 MiB is free. Including non-PyTorch memory, this process has 7.87 GiB memory in use. Of the allocated memory 7.60 GiB is allocated by PyTorch, and 164.74 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/dkords/dev/repos/TripoSR/menv/lib/python3.8/site-packages/gradio/queueing.py", line 501, in process_events
response = await self.call_prediction(awake_events, batch)
File "/home/dkords/dev/repos/TripoSR/menv/lib/python3.8/site-packages/gradio/queueing.py", line 465, in call_prediction
raise Exception(str(error) if show_error else None) from error
Exception: None