spla-tam/SplaTAM

Question regarding the necessity of viewing cameras and relative pose transformations

Wulizhou888 opened this issue · 1 comments

Thanks for your great work!

I'm curious why we're not directly tracking the absolute pose instead of using the relative pose to the viewing camera. I've noticed that the "transform_to_frame" function incurs significant latency both in forward and backward passes. It seems that the rel_w2c can be fused with first_frame_w2c as the viewmatrix for invoking the CUDA kernel. I speculate that this fusion could potentially save a significant number of operations in PyTorch.

Another question is, why is the scale parameter of the Gaussians logged first and then exponentiated for rendering (as showing in following figures)
image
image
It seems no mathmetical effect buta incur extra elementwise opertions.

Looking forward to your reply. Thanks a lot.

Hi, Thanks for your interest in our work!

Yes, you are right; using a CUDA kernel should make the operation faster. We are currently working on speed optimizations for SplaTAM.

The main effect here would be the learning rate and how the Gaussian parameters (particularly scale and position) get affected through gradient descent. We follow this convention based on the original 3D Gaussian Splatting.