RuntimeError: CUDA out of memory

Question

RuntimeError: CUDA out of memory

autumn999999 opened this issue 4 months ago · 4 comments

Thanks for your excellent work!
In my training process, the image resolution is 2048*2048, and there is no scaling (using command '-r 1'). I want to iterate 30k, but after each run to 20k, the OOM problem will occur, could you tell me how to modify the code to fix this problem?

This is my complete command: python train.py -s /home/xxx/2d-gaussian-splatting/dataset/paper -m /home/xxx/2d-gaussian-splatting/result -r 1 --depth_ratio 1

This is the error:
Traceback (most recent call last):
File "/home/xxx/2d-gaussian-splatting/train.py", line 253, in
training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint)
File "/home/xxx/2d-gaussian-splatting/train.py", line 69, in training
render_pkg = render(viewpoint_cam, gaussians, pipe, background)
File "/home/xxx/2d-gaussian-splatting/gaussian_renderer/init.py", line 97, in render
rendered_image, radii, allmap = rasterizer(
File "/home/xxx/anaconda3/envs/sugar/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/xxx/anaconda3/envs/sugar/lib/python3.9/site-packages/diff_surfel_rasterization/init.py", line 212, in forward
return rasterize_gaussians(
File "/home/xxx/anaconda3/envs/sugar/lib/python3.9/site-packages/diff_surfel_rasterization/init.py", line 32, in rasterize_gaussians
return _RasterizeGaussians.apply(
File "/home/xxx/anaconda3/envs/sugar/lib/python3.9/site-packages/diff_surfel_rasterization/init.py", line 92, in forward
num_rendered, color, depth, radii, geomBuffer, binningBuffer, imgBuffer = _C.rasterize_gaussians(*args)
RuntimeError: CUDA out of memory. Tried to allocate 3.26 GiB (GPU 0; 23.67 GiB total capacity; 11.02 GiB already allocated; 2.14 GiB free; 17.60 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Answer 1 · 2024-05-22T15:03:04.000Z

What GPU do you use ?
How much RAM do you have ?

If you have a lot of memory, in NVIDIA control panel, it's possible to expend memory if needed (Select "Prefer System Fallback")

This helped me a lot when I still only had my pretty RTX 2070 SUPER ^-^

Answer 2 · 2024-05-22T15:20:58.000Z

What GPU do you use ?你使用什么GPU？ How much RAM do you have ?你有多少内存？

If you have a lot of memory, in NVIDIA control panel, it's possible to expend memory if needed (Select "Prefer System Fallback")如果您有大量内存，在 NVIDIA 控制面板中，可以根据需要扩展内存（选择“首选系统回退”）

This helped me a lot when I still only had my pretty RTX 2070 SUPER ^-^当我还只有漂亮的 RTX 2070 SUPER 时，这对我帮助很大 ^-^

Thanks for your help! My GPU is GeForce 3090, and free memory is 30GB. I'm sorry that I'm not very familiar with graphics card settings. Is this method also applicable under Ubuntu?

Answer 3 · 2024-05-22T15:30:51.000Z

Can you plot the training log? What is the number of points? I have no idea why the errors occurred in 20k; No more points will be added and cause OOM after 15k from my understanding.

Answer 4 · 2024-05-22T16:40:37.000Z

Hmmm, sorry dude, I thought you was on Windows x)
Normally on Linux, it's done automatically =D

Strange.. With WSL on Windows, sometimes, I have to kill "WSL" process between step in order to free up RAM space.
On Linux, I'm not able to help you sorry :(

How much picture do you train ?