ygjwd12345/TransDepth

[Memory Address Question] How to control gpu memory usage in this code?

Closed this issue · 3 comments

Thank you for your excellent work.
I encountered a CUDA out of memory error while turning your code. Perhaps I think this is a problem caused by a lack of gpu memory.
Because of this, I increased the number of num_threads in the multi-gpu part of your code and reduced the batch size, but the error still does not disappear. Do you happen to know how to control this?

Below is the full text of errors.

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/cv1/miniconda3/envs/transdepth/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "/home/cv1/TransDepth/pytorch/bts_main.py", line 347, in main_worker
model = BtsModel(args)
File "/home/cv1/TransDepth/pytorch/bts.py", line 345, in init
self.encoder = ViT_seg(config_vit, img_size=[params.input_height,params.input_width], num_classes=config_vit.n_classes).cuda()
File "/home/cv1/miniconda3/envs/transdepth/lib/python3.6/site-packages/torch/nn/modules/module.py", line 304, in cuda
return self._apply(lambda t: t.cuda(device))
File "/home/cv1/miniconda3/envs/transdepth/lib/python3.6/site-packages/torch/nn/modules/module.py", line 201, in _apply
module._apply(fn)
File "/home/cv1/miniconda3/envs/transdepth/lib/python3.6/site-packages/torch/nn/modules/module.py", line 201, in _apply
module._apply(fn)
File "/home/cv1/miniconda3/envs/transdepth/lib/python3.6/site-packages/torch/nn/modules/module.py", line 201, in _apply
module._apply(fn)
File "/home/cv1/miniconda3/envs/transdepth/lib/python3.6/site-packages/torch/nn/modules/module.py", line 223, in _apply
param_applied = fn(param)
File "/home/cv1/miniconda3/envs/transdepth/lib/python3.6/site-packages/torch/nn/modules/module.py", line 304, in
return self._apply(lambda t: t.cuda(device))
RuntimeError: CUDA out of memory. Tried to allocate 72.00 MiB (GPU 0; 10.76 GiB total capacity; 400.86 MiB already allocated; 66.69 MiB free; 452.00 MiB reserved in total by PyTorch)

The problem is CUDA out of memory. I would suggest you should crop the batch size more.

I run in 4 V100.

Closed due to long periods of inactivity