Fictionarry/DNGaussian

The dimension is wrong when calculate loss

Closed this issue · 4 comments

Thanks for your fancy work~
Here I met some problems when re-producing your work on LLFF dataset. The dimension is wrong when calculate loss
Generating random point cloud (5024)... [24/03 11:26:18]
Loading Training Cameras [1.0] [24/03 11:26:18]
Loading Test Cameras [1.0] [24/03 11:26:20]
Loading Eval Cameras [1.0] [24/03 11:26:22]
Number of points at initialisation : 5024 [24/03 11:26:22]
Reading camera 180/180 [24/03 11:26:22]
Loading Render Cameras [1.0] [24/03 11:26:22]
Training progress: 0%| | 0/6000 [00:00<?, ?it/s]Traceback (most recent call last):
File "train_llff.py", line 406, in
training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from, args.near)
File "train_llff.py", line 161, in training
loss = Ll1 + opt.lambda_dssim * (1.0 - ssim(image, gt_image))
File "/mnt/star/3DGS/DNGaussian-main/utils/loss_utils.py", line 130, in ssim
return _ssim(img1, img2, window, window_size, channel, size_average)
File "/mnt/star/3DGS/DNGaussian-main/utils/loss_utils.py", line 134, in _ssim
mu1 = F.conv2d(img1, window, padding=window_size // 2, groups=channel)
RuntimeError: Expected 4-dimensional input for 4-dimensional weight [3, 1, 11, 11], but got 3-dimensional input of size [3, 378, 504] instead
Training progress: 0%|
2024-03-24 11-34-08 的屏幕截图

Hi, I guess this problem is related to the PyTorch version. A simple solution can be found here: graphdeco-inria/gaussian-splatting#320 (comment)

I add these two lines of code in (graphdeco-inria/gaussian-splatting#320 (comment)), but raise another error

img1 = img1.view(1, 3, img1.shape[1], img1.shape[2])
img2 = img2.view(1, 3, img2.shape[1], img2.shape[2])

Traceback (most recent call last):
File "train_llff.py", line 407, in
training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from, args.near)
File "train_llff.py", line 174, in training
loss.backward()
File "/home/starak/anaconda3/envs/3dgs/lib/python3.7/site-packages/torch/_tensor.py", line 255, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/starak/anaconda3/envs/3dgs/lib/python3.7/site-packages/torch/autograd/init.py", line 149, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered

cuda11.1 pytorch1.9.1

I add these two lines of code in (graphdeco-inria/gaussian-splatting#320 (comment)), but raise another error

img1 = img1.view(1, 3, img1.shape[1], img1.shape[2])
img2 = img2.view(1, 3, img2.shape[1], img2.shape[2])

Traceback (most recent call last): File "train_llff.py", line 407, in training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from, args.near) File "train_llff.py", line 174, in training loss.backward() File "/home/starak/anaconda3/envs/3dgs/lib/python3.7/site-packages/torch/_tensor.py", line 255, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/home/starak/anaconda3/envs/3dgs/lib/python3.7/site-packages/torch/autograd/init.py", line 149, in backward allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag RuntimeError: merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered

cuda11.1 pytorch1.9.1

Well, I can't reproduce this problem while using PyTorch 1.12.1 as recommended in this repo. I think a PyTorch with higher version would solve this problem.

I solved this problem by using Pytorch1.12.1 .Thank you very much.