NVlabs/InstantSplat

Interactive Viewer

sp77-1 opened this issue Β· 15 comments

How can i use interactive viewer to show final ply file?
Thank you for any tips

Can I use the viewer from original gaussian splatting repository?

How can i use interactive viewer to show final ply file?
Thank you for any tips

(1) Follow VanillaGS repo's instruction: https://github.com/graphdeco-inria/gaussian-splatting?tab=readme-ov-file#interactive-viewers
(2) You can also import final gaussian.ply to supersplat for visualization: https://playcanvas.com/supersplat/editor

This is great work and I am very interested.
In my experiments, the final rendered video from render_by_interp.py is of high quality.

However, when I view PointCloud.ply in a 3D viewer such as SIBR Viewer or SuperSplat, some of the scenes are broken.
My investigation shows that the rasterization process in gaussian_renderer/init.py includes the transformation process of Gaussian parameters such as xyz and rotation.
On the other hand, SIBR Viewer does not include a series of transformations, and this difference may be causing the difference in image quality.

Is there any way to resolve this issue?

Hi @d-irie, thanks for your interest! Please see issue #18.

In my experiments, the final rendered video from render_by_interp.py is of high quality.

However, when I view PointCloud.ply in a 3D viewer such as SIBR Viewer or SuperSplat, some of the scenes are broken.

I am experiencing the same issue. The outputted point_cloud.ply data viewed in SuperSplat doesn't look as good visually as the rendered .mp4 video.

I had already changed line 154 of submodules/diff-gaussian-rasterization/cuda_rasterizer/auxiliary.h from p_view.z <= 0.2f to p_view.z <= 0.001f, as suggested in issue #18. Is there something I’m missing?

Video
Screenshot from 2024-11-13 15-19-06

SuperSplat
Screenshot from 2024-11-13 15-18-31

Hi,@nirmalsnair

Can you help me analyze the cause of the problem?

I trained on 38 images extracted from a video I shot myself, and the final outputed video looks good,
屏幕ζˆͺε›Ύ 2024-11-15 235315

But It's strange, why do I open point_cloud.ply with supersplat and see the effect like this?

屏幕ζˆͺε›Ύ 2024-11-15 234731

@crmlei Hi, You should click the button in the red box ⬇️
image

@nirmalsnair Hi, maybe you can try to visualize it with nerf_viewers, or Viser, or SIBR_viewers of 3D-GS.

@kairunwen Thank you for the suggestion. I tried viewing the data using Viser (via experimental/gaussian_splats.py), but the quality is still quite poor.

Screenshot from 2024-11-28 13-54-21

@nirmalsnair Hi, you could try increasing the number of training iterations here: https://github.com/NVlabs/InstantSplat/blob/main/scripts/run_infer.sh#L19.

Hi @kairunwen, thank you for the suggestion. I tried increasing the iterations from 1,000 to 10,000 and 30,000. While the PSNR improved significantlyβ€”from 33.23 at 1,000 iterations to 42.02 at 7,000 iterations, and further to 44.91 at 30,000 iterationsβ€”the visual quality noticeably degraded rather than improving.

The run log is attached below. Any idea what might be causing this issue?

Run log for 30,000 iterations
(instantsplat) nirmal@Raider:~/Documents/InstantSplat$ bash scripts/run_train_infer.sh
========= santorini: Dust3r_coarse_geometric_initialization =========
... loading model from submodules/dust3r/checkpoints/DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth
instantiating : AsymmetricCroCo3DStereo(enc_depth=24, dec_depth=12, enc_embed_dim=1024, dec_embed_dim=768, enc_num_heads=16, dec_num_heads=12, pos_embed='RoPE100', patch_embed_cls='PatchEmbedDust3R', img_size=(512, 512), head_type='dpt', output_mode='pts3d', depth_mode=('exp', -inf, inf), conf_mode=('exp', 1, inf), landscape_only=False)
<All keys matched successfully>
>> Loading images from /home/nirmal/Documents/InstantSplat/data/sora/santorini/3_views/images
 - adding frame_00.jpg with resolution 1920x1080 --> 512x288
 - adding frame_06.jpg with resolution 1920x1080 --> 512x288
 - adding frame_12.jpg with resolution 1920x1080 --> 512x288
 (Found 3 images)
ori_size (1920, 1080)
>> Inference with model on 6 image pairs
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 6/6 [00:00<00:00,  8.77it/s]
 init edge (0*,1*) score=np.float64(9.563681602478027)
 init edge (1,2*) score=np.float64(8.826717376708984)
 (setting focal #0 = 440.46165974934894)
 (setting focal #1 = 440.46165974934894)
 (setting focal #2 = 440.46165974934894)
 init loss = 0.027959637343883514
Global alignement - optimizing for:
['pw_poses', 'im_depthmaps', 'im_poses', 'im_conf.0', 'im_conf.1', 'im_conf.2']
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 300/300 [00:03<00:00, 77.20it/s, lr=0.01 loss=0.00364887]
Time taken for 3 views: 4.721898317337036 seconds
scale factor is not same for x3.75 and y 3.75
scale factor is not same for x3.75 and y 3.75
scale factor is not same for x3.75 and y 3.75
========= santorini: Train: jointly optimize pose =========
Optimizing ./output/infer/sora/santorini/3_views_30000Iter_1xPoseLR/
Output folder: ./output/infer/sora/santorini/3_views_30000Iter_1xPoseLR/
Reading camera 3/3
Loading Training Cameras
[ INFO ] Encountered quite large input images (>1.6K pixels width), rescaling to 1.6K.
 If this is not desired, please explicitly specify '--resolution/-r' as 1
train_camera_num:  3
Loading Test Cameras
test_camera_num:  0
Number of points at initialisation :  356778
Training progress:   2%|β–ˆβ–Ž                                                                             | 500/30000 [00:25<22:49, 21.54it/s, Loss=0.0222455]
[ITER 500] Evaluating train: L1 0.012928157113492489 PSNR 29.113671620686848
Training progress:   3%|β–ˆβ–ˆ                                                                             | 800/30000 [00:41<21:46, 22.35it/s, Loss=0.0153423]
[ITER 800] Evaluating train: L1 0.009307358103493849 PSNR 31.9732723236084
Training progress:   3%|β–ˆβ–ˆβ–Œ                                                                           | 1000/30000 [00:50<21:25, 22.56it/s, Loss=0.0133920]
[ITER 1000] Evaluating train: L1 0.008184022270143032 PSNR 33.19559605916341
Training progress:   5%|β–ˆβ–ˆβ–ˆβ–‰                                                                          | 1500/30000 [01:13<20:41, 22.96it/s, Loss=0.0094947]
[ITER 1500] Evaluating train: L1 0.006391201789180437 PSNR 35.51490783691406
Training progress:   7%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                        | 2000/30000 [01:36<20:20, 22.94it/s, Loss=0.0076353]
[ITER 2000] Evaluating train: L1 0.005400731693953276 PSNR 36.96710077921549
Training progress:  10%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                                                      | 3000/30000 [02:21<19:23, 23.20it/s, Loss=0.0060085]
[ITER 3000] Evaluating train: L1 0.004438495961949229 PSNR 38.91851806640625
Training progress:  13%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                   | 4000/30000 [03:06<18:57, 22.85it/s, Loss=0.0049146]
[ITER 4000] Evaluating train: L1 0.0037851009983569384 PSNR 40.39331309000651
Training progress:  17%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                                                 | 5000/30000 [03:51<18:16, 22.80it/s, Loss=0.0044015]
[ITER 5000] Evaluating train: L1 0.0035195722399900355 PSNR 41.246222178141274
Training progress:  20%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                              | 6000/30000 [04:36<17:25, 22.95it/s, Loss=0.0040494]
[ITER 6000] Evaluating train: L1 0.0032493871791909137 PSNR 41.90483856201172
Training progress:  23%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                           | 7000/30000 [05:20<16:40, 22.98it/s, Loss=0.0039207]
[ITER 7000] Evaluating train: L1 0.0031754142449547844 PSNR 42.31662623087565
Training progress: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 30000/30000 [21:44<00:00, 23.00it/s, Loss=0.0027425]

[ITER 30000] Evaluating train: L1 0.00244275185589989 PSNR 44.90543111165364

[ITER 30000] Saving Gaussians

Training complete.
========= santorini: Render interpolated pose & output video =========
Looking for config file in ./output/infer/sora/santorini/3_views_30000Iter_1xPoseLR/cfg_args
Config file found: ./output/infer/sora/santorini/3_views_30000Iter_1xPoseLR/cfg_args
Rendering ./output/infer/sora/santorini/3_views_30000Iter_1xPoseLR/
Loading trained model at iteration 30000
Reading camera 200/200
Loading Training Cameras
[ INFO ] Encountered quite large input images (>1.6K pixels width), rescaling to 1.6K.
 If this is not desired, please explicitly specify '--resolution/-r' as 1
train_camera_num:  200
Loading Test Cameras
test_camera_num:  200
Rendering progress: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 200/200 [00:52<00:00,  3.81it/s]
IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (1600, 900) to (1600, 912) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).

1,000 iterations

1000 iterations

10,000 iterations

10000 iterations

30,000 iterations

30000 iterations

@nirmalsnair Hi, we have updated the code with the preprint accordingly. You can first run the script scripts/run_infer.sh, and then set the training iterations to around 1500. According to your log, the PSNR of the training set reaches 35+ at this point, and increasing the iterations further may lead to overfitting.

Thanks, @kairunwen! The updated code resolves the issue, and the splat now looks significantly better.

updated

In my experiments, the final rendered video from render_by_interp.py is of high quality.
However, when I view PointCloud.ply in a 3D viewer such as SIBR Viewer or SuperSplat, some of the scenes are broken.

I am experiencing the same issue. The outputted point_cloud.ply data viewed in SuperSplat doesn't look as good visually as the rendered .mp4 video.

I had already changed line 154 of submodules/diff-gaussian-rasterization/cuda_rasterizer/auxiliary.h from p_view.z <= 0.2f to p_view.z <= 0.001f, as suggested in issue #18. Is there something I’m missing?

Video Screenshot from 2024-11-13 15-19-06

SuperSplat Screenshot from 2024-11-13 15-18-31

Have you figured out why there's such a big discrapency between rendered video and actual splats? They were generated under the same number of iterations and setting etc., what was the cause?

@pengcanon I'm not sure what’s causing the issue since I haven’t had a chance to review the code, but the update resolves it.