VAST-AI-Research/TripoSR

Please document settings for PYTORCH memory, how to split across GPUs

Opened this issue · 2 comments

Currently for a GPU with smaller vram, it will get OOM exception. I have multiple GPUs and on other systems, I specify "autodevices" or specify GPU numbers to use: CUDA:0,1,2 . Please describe what changes to make in the code, or offer a flag to set these things.

Hi @truedat101, can you post the VRAM OOM error log here?
Besides, run.py uses cuda:0 by default, and you can specify the GPU with the --device argument.

I will try it. I have a frankestein machine with 4 smaller GPUs that I would like to put to work (total of 32gb). So

Trace from gradio:

    return forward_call(*args, **kwargs)
  File "/home/dkords/dev/repos/TripoSR/menv/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 116, in forward
    return F.linear(input, self.weight, self.bias)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 96.00 MiB. GPU 0 has a total capacity of 7.92 GiB of which 31.62 MiB is free. Including non-PyTorch memory, this process has 7.87 GiB memory in use. Of the allocated memory 7.60 GiB is allocated by PyTorch, and 164.74 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/dkords/dev/repos/TripoSR/menv/lib/python3.8/site-packages/gradio/queueing.py", line 501, in process_events
    response = await self.call_prediction(awake_events, batch)
  File "/home/dkords/dev/repos/TripoSR/menv/lib/python3.8/site-packages/gradio/queueing.py", line 465, in call_prediction
    raise Exception(str(error) if show_error else None) from error
Exception: None