Question on gradient of depth
chenyuntc opened this issue · 16 comments
Thanks for the great work! This work is awesome.
I note that the gradient is not backpropaged from the pred_depth, do you plan to add it? This would be super helpful for those who want to add geometry supervision.
Sounds not very difficult. I can try to do that if I have time... Also, contributions are welcomed!
Thank you @wanmeihuali , I quickly skim the code, seems challenging to me 😂 , need to propagage the gradient from predicted_depth to alpha (and then to point_uv) and point_depth, and then merge to get the gradient of covariance and point location.
Let me know if my understanding is not correct. I would be happy to give it a try if there is a simple way to implement it, though I haven't wrote taichi before.
I can provide some idea, please correct me if I am wrong.
in which
If we only want geometry supervision for point location, then optimize point depth
in which
in the code, we already have:
so we can define a new variable to store the value of
I created a draft PR(#122) that modified the rasterizer(taichi) part to support gradient from depth. The code is not tested(I don't have any test dataset either...), also, the pytorch part shall also need to be modified. You can start from here to modify the code and do experiments. @chenyuntc
I tested it this morning, it works for the pts' coordinates! Thank you so much!
import taichi as ti
from taichi_3d_gaussian_splatting.GaussianPointCloudRasterisation import GaussianPointCloudRasterisation
import torch
import numpy as np
from taichi_3d_gaussian_splatting.Camera import CameraInfo
from taichi_3d_gaussian_splatting.utils import se3_to_quaternion_and_translation_torch
RasterConifg = GaussianPointCloudRasterisation.GaussianPointCloudRasterisationConfig
def render(pts, pts_feat, c2w, intrin, HW):
rasterisation = GaussianPointCloudRasterisation(
config=RasterConifg(near_plane=0.4, far_plane=2000.0, depth_to_sort_key_scale=10.0, rgb_only=False),
)
camera_info = CameraInfo(camera_intrinsics=intrin.to(pts.device),camera_height=HW[0],camera_width=HW[1],
camera_id=0)
q_pointcloud_camera, t_pointcloud_camera = se3_to_quaternion_and_translation_torch(c2w[None])
gaussian_input = GaussianPointCloudRasterisation.GaussianPointCloudRasterisationInput(
point_cloud=pts.float(),
point_cloud_features=pts_feat.cuda(),
point_object_id=torch.zeros(pts.shape[0], dtype=torch.int32,device=pts.device),
point_invalid_mask=torch.zeros(pts.shape[0], dtype=torch.int8,device=pts.device),
camera_info=camera_info,
q_pointcloud_camera=q_pointcloud_camera.cuda().contiguous(),
t_pointcloud_camera=t_pointcloud_camera.cuda().contiguous(),
color_max_sh_band=6,
)
res = rasterisation(gaussian_input)
return res
if __name__ == '__main__':
ti.init(arch=ti.cuda, device_memory_GB=0.1)
pts = torch.randn(10000,3,device='cuda').requires_grad_(True)
pts_feat = torch.zeros(pts.shape[0], 56,device=pts.device)
pts_feat[:, 0:4] = torch.rand_like(pts_feat[:, 0:4])
pts_feat[:, 0:4] = pts_feat[:, 0:4] / torch.norm(pts_feat[:, 0:4], dim=1, keepdim=True)
pts_feat[:, 4:7] = torch.zeros(pts.shape[0],1,device=pts.device).repeat(1,3)
pts_feat[:, 7] = 0.05
pts_feat[:, 8] = 1.0
pts_feat[:, 9:24] = 0.0
pts_feat[:, 24] = 1.0
pts_feat[:, 25:40] = 0.0
pts_feat[:, 40] = 1.0
pts_feat[:, 41:56] = 0.0
pts_feat.requires_grad_(True)
c2w = torch.eye(4,device='cuda')
HW = [1080//64*16,1920//64*16]
intrin = torch.Tensor([[100,0,200],[0,100,200],[0,0,1]]).cuda()
iteration = 0
import tqdm
optimizer = torch.optim.Adam([pts,pts_feat],lr=0.01)
for ii in tqdm.trange(100000):
optimizer.zero_grad()
res=render(pts, pts_feat, c2w, intrin, HW)
mask = res[1]>0
loss = (res[1]-3).abs()[mask].mean()
loss.backward()
optimizer.step()
if ii%500==0:
#save_img(res[1])
#plot3d(pts)
print(loss)
Thanks! @chenyuntc , If you further prove that the geometry supervision for
Hello @wanmeihuali , while I try to test the supervision for
I derive the gradient on my own, it seems different from the one you showed.
Given
We have
Let:
we convert
We have
that is
Because
I haven't done the calculus for a while, not sure correct or not, but this result is more intuitive to me as I can see the gradient is relative to the
Hi @chenyuntc , I see what happens. In my calculation, I treat
In your calculation, I'm wondering why are you calculating
Which is actually your result divided by g... Anyway, have you tried your equation? Maybe the calculation does not need to be mathematically correct, it's more like designing the loss. I mean, As far as the gradient penalize the alpha for points far from
Yes, your are correct. your final derivation looks correct to me.
In my derivation, the
I haven't try the implementation in taichi yet as I'm not familiar with taichi. Would be appreciated if you can try it when available
Hello @wanmeihuali ,
- while I'm going over the existing implementation, I have one question: The
w_i
in code , it seems to be
$$w_i = \sum_{j=1}^{i}{ {color_j \alpha_j T_j}}$$
rather than$$w_i = \sum_{j=i+1}^{n}{\frac{color_j \alpha_j T_j}{1-\alpha_i}}$$
as you mentioned (note the supscript is$i$ instead of$n$ ), Right? Unless it's inverse order? - You may ignore this one, if it's inverse order. What I did here is trying to figure out how to implement it in the forward-order. I note that
where
That means we need to store pixel_accumulated_alpha
(accumulated_depth
(rasterized_depth
(
Not sure whether it can help simplify the implementation 🤔
I can check that when I go back to home tonight. Anyway, backward is in reverse order(from n to 1)
Thank you so much!
Hello @wanmeihuali , I created a PR #126 for this, any feedbacks would be appreciated
Nice work @chenyuntc ! Just see your experiment, the findings seem very reasonable. If there are many points, the rasterizer will stop very early, and only the first several points met by the ray will be optimized. You can try more steps to see if more points get optimized. Also, during real training, points with small alpha will periodically be removed. So I think this issue won't affect results too much during real training.
The next step is to test the code on some more complex case, e.g. more complex object rather than a wall. Then finally test it on a real dataset, I'm not sure if geometry supervision works(unless you get very accurate depth measurements, e.g. from lidar), but I guess letting points converge to the depth estimated(the surface) can help improve training quality.
You can try more steps to see if more points get optimized.
Do you mean more training iterations? I tried for 100K iterations, but the loss did not change at all after 10K iterations.
I find it's extremely difficult to make it work without low-alpha initialization
For example, if I initial pts_feat[:, 7] = -4
instead of -5, the loss decrease in the beginning but then soon saturated. (it also depends on the number of points, with more points, the initial value need to be smaller)
Also, during real training, points with small alpha will periodically be removed.
I also tried removing pts with low-alpha, but still not worked.
I'm not sure if geometry supervision works(unless you get very accurate depth measurements, e.g. from lidar),
Based on my previous experience on NeRF, depth supervision (from lidar) is super important for the sparse observations in urban-driving scenes. I will let you know if I can make it work in the urban-driving scenes.
Maybe we can first try some easier solution. e.g.
given a list of points through which a ray passes, use some threshold to split the points into three groups: from camera to area close to the estimated/provided depth, points in the area close to the estimated/provided depth, and points deeper than estimated/provided depth. All we need to do is to penalize the first group, award the second group, and ignore the third group. The penalize/award amount shall also be proportional to the color weight(