CUDA out of memory

Question

CUDA out of memory

Opened this issue 2 years ago · 8 comments

Hello, I've been training for a while， But an error is reported halfway. Is there any way to solve this problem wiht no changing the graphics card

scene data use female smpl
/home/xds/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1640811806235/work/aten/src/ATen/native/TensorShape.cpp:2157.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
camera ang threshold is 0.010285
box:
[-0.7080196142196655, -1.2795634269714355, -0.3215314447879791]
[0.7120546102523804, 0.7051210403442383, 0.3668109178543091]
/home/xds/project/SelfReconCode/MCAcc/seg3d_lossless.py:246: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
stride = (self.resolutions[-1] - 1) // (resolution - 1)
/home/xds/project/SelfReconCode/MCAcc/seg3d_lossless.py:261: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
coords_accum = coords // stride
/home/xds/project/SelfReconCode/MCAcc/seg3d_lossless.py:341: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
voxels = coords // stride
/home/xds/project/SelfReconCode/MCAcc/seg3d_lossless.py:381: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
point_coords = coords // stride
/home/xds/project/SelfReconCode/MCAcc/seg3d_lossless.py:417: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
voxels = coords // stride
Traceback (most recent call last):
File "train.py", line 167, in
loss=optNet(outs,sample_pix_num,ratio,frame_ids,debug_root)
File "/home/xds/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/xds/project/SelfReconCode/model/network.py", line 502, in forward
total_loss=self.computeTmpPcLoss(defMeshes,[d_cond,[poses,trans]],masks,mgtMs,ratio)
File "/home/xds/project/SelfReconCode/model/network.py", line 687, in computeTmpPcLoss
loss.backward()
File "/home/xds/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/torch/_tensor.py", line 307, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/xds/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/torch/autograd/init.py", line 154, in backward
Variable._execution_engine.run_backward(
File "/home/xds/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/torch/autograd/function.py", line 199, in apply
return user_fn(self, *args)
File "/home/xds/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/pytorch3d-0.4.0-py3.8-linux-x86_64.egg/pytorch3d/renderer/compositing.py", line 56, in backward
grad_features, grad_alphas = _C.accum_alphacomposite_backward(
RuntimeError: CUDA out of memory. Tried to allocate 668.00 MiB (GPU 0; 10.76 GiB total capacity; 8.00 GiB already allocated; 443.38 MiB free; 8.18 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Answer 1 · 2022-07-18T13:26:43.000Z

The default config requires some memories and a GTX 3090 is recommended. You can change the marching cube resolutions to reduce memory, but the related optimization parameters are also needed to readjust. This is a little tedious.

Answer 2 · 2022-07-18T14:16:37.000Z

thank you！ I will try to adjust the parameters, hoping to succeed

Answer 3 · 2022-07-18T14:18:10.000Z

Do you know how much memory you need

Answer 4 · 2022-07-18T17:51:03.000Z

almost 24 Gb

Answer 5 · 2022-10-11T00:10:28.000Z

I'm using GeForce RTX 3070 Laptop GPU, and got the same error as below.
I edited config.conf a bit; reducing "sample_pix_num", "num_workers", "batch_size", but all in fail.
Which parameters should I edit to avoid CUDA out of memory error?

error message

$ CUDA_VISIBLE_DEVICES=0 python train.py --gpu-ids 0 --conf config.conf --data $ROOT/female-3-casual --save-folder result
scene data use female smpl
/home/mas/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1640811806235/work/aten/src/ATen/native/TensorShape.cpp:2157.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Traceback (most recent call last):
File "train.py", line 98, in
optNet,sdf_initialized=getOptNet(dataset,batch_size,bmins,bmaxs,resolutions['coarse'],device,config,use_initial_sdf)
File "/home/mas/proj/study/computer_vision/SelfReconCode/model/network.py", line 850, in getOptNet
skinner,tmpBodyVs,tmpBodyFs=initialLBSkinner(dataset.gender,dataset.shape.to(device),initPose,(128+1, 224+1, 64+1),bmins,bmaxs)
File "/home/mas/proj/study/computer_vision/SelfReconCode/model/Deformer.py", line 294, in initialLBSkinner
ws=compute_lbswField(bmins,bmaxs,resolution,verts.view(6890,3),smpl.weight.view(6890,24),align_corners=False,mean_neighbor=30,smooth_times=30)
File "/home/mas/proj/study/computer_vision/SelfReconCode/model/Deformer.py", line 269, in compute_lbswField
dists,indices=(tmp[:,None,:]-smpl_verts[None,:,:]).norm(dim=-1).topk(mean_neighbor,dim=-1,largest=False)
File "/home/mas/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/torch/_tensor.py", line 442, in norm
return torch.norm(self, p, dim, keepdim, dtype=dtype)
File "/home/mas/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/torch/functional.py", line 1442, in norm
return _VF.frobenius_norm(input, _dim, keepdim=keepdim)
RuntimeError: CUDA out of memory. Tried to allocate 1.29 GiB (GPU 0; 7.80 GiB total capacity; 5.22 GiB already allocated; 724.12 MiB free; 5.23 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Answer 6 · 2022-12-30T12:21:57.000Z

zhihu
import os os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:128"

Answer 7 · 2022-12-30T23:32:02.000Z

Thank you, will try.

Answer 8 · 2022-12-31T00:33:33.000Z

I put

os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:128"

at line 27 in train.py, and run

CUDA_VISIBLE_DEVICES=0 python train.py --gpu-ids 0 --conf config.conf --data $ROOT/female-3-casual --save-folder result

But it failed with "Segmentation fault (core dumped)" ...