Question on gradient of depth

Thanks for the great work! This work is awesome.

I note that the gradient is not backpropaged from the pred_depth, do you plan to add it? This would be super helpful for those who want to add geometry supervision.

taichi_3d_gaussian_splatting/taichi_3d_gaussian_splatting/GaussianPointCloudRasterisation.py

Line 972 in f7631e3

    
           def backward(ctx, grad_rasterized_image, grad_rasterized_depth, grad_pixel_valid_point_count):

Sounds not very difficult. I can try to do that if I have time... Also, contributions are welcomed!

Thank you @wanmeihuali , I quickly skim the code, seems challenging to me 😂 , need to propagage the gradient from predicted_depth to alpha (and then to point_uv) and point_depth, and then merge to get the gradient of covariance and point location.
Let me know if my understanding is not correct. I would be happy to give it a try if there is a simple way to implement it, though I haven't wrote taichi before.

I can provide some idea, please correct me if I am wrong.

$$ T_i=\prod_{j=1}^{i-1} (1-\alpha_{j}) $$

$$ d_{pixel}=\frac{\sum_{i=1}^{n}{d_{i} \alpha_{i} \prod_{j=1}^{i-1} (1-\alpha_{j})}}{\sum_{i=1}^{n}{\alpha_{i} \prod_{j=1}^{i-1} (1-\alpha_{j})}}=\frac{\sum_{i=1}^{n}{d_{i} \alpha_{i} T_i}}{\sum_{i=1}^{n}{\alpha_{i} T_i}} $$

in which $d_{pixel}$ is the pixel level depth, $d_{i}$ is the depth of the $i$-th effective gaussian point on the pixel. $\sum_{i=1}^{n}{\alpha_{i} T_i}$ is in range $(0,1)$ and as we only wants to do gradient descent on $d_{pixel}$, we can ignore it. Then we have:

$$ \frac{\partial{d_{pixel}}}{\partial{d_i}}=\frac{\alpha_{i} T_i}{\sum_{i=1}^{n}{\alpha_{i} T_i}} \propto \alpha_{i} T_i $$

$$ \frac{\partial{loss}}{\partial{d_i}}=\frac{\partial{loss}}{\partial{d_{pixel}}} \frac{\partial{d_{pixel}}}{\partial{d_i}} \propto \alpha_{i} T_i \frac{\partial{loss}}{\partial{d_{pixel}}} $$

If we only want geometry supervision for point location, then optimize point depth $d_i$ directly is enough. However, if we want to optimize $\alpha_i$ as well, then we need to consider the effect of $\alpha_i$ on $d_{pixel}$.

$$ \frac{\partial{loss}}{\partial{\alpha_i}}=\frac{\partial{loss}}{\partial{d_{pixel}}} \frac{\partial{d_{pixel}}}{\partial{\alpha_i}} + \frac{\partial{loss}}{\partial{color}} \frac{\partial{color}}{\partial{\alpha_i}} $$

in which $\frac{\partial{color}}{\partial{\alpha_i}}$ is the color gradient of the $i$-th effective gaussian point on the pixel, which is the current $\frac{\partial{loss}}{\partial{\alpha_i}}$ in the code. We just need to add $\frac{\partial{loss}}{\partial{d_{pixel}}} \frac{\partial{d_{pixel}}}{\partial{\alpha_i}}$ to it.

$$ \frac{\partial{d_{pixel}}}{\partial{\alpha_i}}=\frac{d_i T_i - \sum_{j=i+1}^{n}{\frac{d_j \alpha_j T_j}{1-\alpha_i}}} {\sum_{i=1}^{n}{\alpha_{i} T_i}} \propto {d_i T_i - \sum_{j=i+1}^{n}{\frac{d_j \alpha_j T_j}{1-\alpha_i}}} $$

in the code, we already have:

$$ w_i = \sum_{j=i+1}^{n}{\frac{color_j \alpha_j T_j}{1-\alpha_i}} $$

so we can define a new variable to store the value of $\sum_{j=i+1}^{n}{\frac{d_j \alpha_j T_j}{1-\alpha_i}}$ in a similar way. And handle $\frac{\partial{\alpha_i}}{\partial{position_i}}$ and $\frac{\partial{\alpha_i}}{\partial{covariance_i}}$ is already implemented in the code and you don't need to change it.

I created a draft PR(#122) that modified the rasterizer(taichi) part to support gradient from depth. The code is not tested(I don't have any test dataset either...), also, the pytorch part shall also need to be modified. You can start from here to modify the code and do experiments. @chenyuntc

I tested it this morning, it works for the pts' coordinates! Thank you so much!

import taichi as ti
from taichi_3d_gaussian_splatting.GaussianPointCloudRasterisation import GaussianPointCloudRasterisation
import torch
import numpy as np
from taichi_3d_gaussian_splatting.Camera import CameraInfo
from taichi_3d_gaussian_splatting.utils import se3_to_quaternion_and_translation_torch

RasterConifg = GaussianPointCloudRasterisation.GaussianPointCloudRasterisationConfig
def render(pts, pts_feat, c2w, intrin, HW):
    rasterisation = GaussianPointCloudRasterisation(
            config=RasterConifg(near_plane=0.4, far_plane=2000.0, depth_to_sort_key_scale=10.0, rgb_only=False),
        )
    camera_info = CameraInfo(camera_intrinsics=intrin.to(pts.device),camera_height=HW[0],camera_width=HW[1],
                             camera_id=0)   
    q_pointcloud_camera, t_pointcloud_camera = se3_to_quaternion_and_translation_torch(c2w[None])
    gaussian_input = GaussianPointCloudRasterisation.GaussianPointCloudRasterisationInput(
        point_cloud=pts.float(),
        point_cloud_features=pts_feat.cuda(),
        point_object_id=torch.zeros(pts.shape[0], dtype=torch.int32,device=pts.device),
        point_invalid_mask=torch.zeros(pts.shape[0], dtype=torch.int8,device=pts.device),
        camera_info=camera_info,
        q_pointcloud_camera=q_pointcloud_camera.cuda().contiguous(),
        t_pointcloud_camera=t_pointcloud_camera.cuda().contiguous(),
        color_max_sh_band=6, 
    )
    res = rasterisation(gaussian_input)
    return res
if __name__ == '__main__':
    ti.init(arch=ti.cuda, device_memory_GB=0.1)
    pts = torch.randn(10000,3,device='cuda').requires_grad_(True)
    pts_feat = torch.zeros(pts.shape[0], 56,device=pts.device)
    pts_feat[:, 0:4] = torch.rand_like(pts_feat[:, 0:4])
    pts_feat[:, 0:4] = pts_feat[:, 0:4] /  torch.norm(pts_feat[:, 0:4], dim=1, keepdim=True)
    pts_feat[:, 4:7] = torch.zeros(pts.shape[0],1,device=pts.device).repeat(1,3)
    pts_feat[:, 7] =  0.05
    pts_feat[:, 8] = 1.0
    pts_feat[:, 9:24] = 0.0
    pts_feat[:, 24] = 1.0
    pts_feat[:, 25:40] = 0.0
    pts_feat[:, 40] = 1.0
    pts_feat[:, 41:56] = 0.0
    pts_feat.requires_grad_(True)
   
    c2w = torch.eye(4,device='cuda')
    HW = [1080//64*16,1920//64*16]
    intrin = torch.Tensor([[100,0,200],[0,100,200],[0,0,1]]).cuda()
    iteration = 0
    import tqdm
    optimizer = torch.optim.Adam([pts,pts_feat],lr=0.01)
    for ii in tqdm.trange(100000):
        optimizer.zero_grad()
        res=render(pts, pts_feat, c2w, intrin, HW)
        mask = res[1]>0
        loss = (res[1]-3).abs()[mask].mean()
        loss.backward()
        optimizer.step()
        if ii%500==0:
            #save_img(res[1])
            #plot3d(pts)
            print(loss)

Thanks! @chenyuntc , If you further prove that the geometry supervision for $\alpha$ works(either by similar test code or your own experiment), please notify me or @yanzhoupan , and we will try to merge this branch. Also, can you put your plot code in a separate directory in the repo? e.g. scratch or experiment(you can just create one).

Hello @wanmeihuali , while I try to test the supervision for $\alpha$, I found it difficult to converge only optimizing the alpha.

I derive the gradient on my own, it seems different from the one you showed.

Given

$$ T_i=\prod_{j=1}^{i-1} (1-\alpha_{j})\\ $$

We have

$$ \frac{d_{T_i}}{d_{\alpha_{k}} } = -T_i / (1-\alpha_k)) $$

Let:

$$ f={\sum_{i=1}^{n}{d_{i} \alpha_{i} T_i}}, g={\sum_{i=1}^{n}{\alpha_{i} T_i}} $$

we convert $d_{pixel}=f/g$

$$ d_{pixel}=\frac{\sum_{i=1}^{n}{d_{i} \alpha_{i} \prod_{j=1}^{i-1} (1-\alpha_{j})}}{\sum_{i=1}^{n}{\alpha_{i} \prod_{j=1}^{i-1} (1-\alpha_{j})}}=\frac{\sum_{i=1}^{n}{d_{i} \alpha_{i} T_i}}{\sum_{i=1}^{n}{\alpha_{i} T_i}} = \frac f g \\ $$

We have

$$ d' \propto (f'g -g'f) $$

$$ f'=\frac {df}{d{\alpha_k}} = d_kT_k + \sum_{i=K+1}^n d_i\alpha_i \frac{-T_i}{1-\alpha_k} $$

$$ g'=\frac {dg}{d{\alpha_k}} = T_k + \sum_{i=K+1}^n \alpha_i \frac{-T_i}{1-\alpha_k} \\ $$

that is

$$ d' = (d_kT_k + \sum_{i=K+1}^n d_i\alpha_i \frac{-T_i}{1-\alpha_k})g - (T_k + \sum_{i=K+1}^n \alpha_i \frac{-T_i}{1-\alpha_k})f $$

$$ \frac {d'}{g} = (d_kT_k + \sum_{i=K+1}^n d_i\alpha_i \frac{-T_i}{1-\alpha_k}) - (T_k + \sum_{i=K+1}^n \alpha_i \frac{-T_i}{1-\alpha_k})\frac f g \\ $$

Because $d_{pixel}=f/g$

$$ \frac {d'}{g} =T_k(d_k-d_{pixel}) + \sum_{i=K+1}^n \alpha_i \frac{-T_i}{1-\alpha_k} (d_i - d_{pixel}) $$

I haven't done the calculus for a while, not sure correct or not, but this result is more intuitive to me as I can see the gradient is relative to the $(d_i - d_{pixel})$

Hi @chenyuntc , I see what happens. In my calculation, I treat $\sum_{i=1}^{n}{\alpha_{i} T_i}$ as a constant, but it is in fact a variable related to $\alpha$. So my calculation may not converge.
In your calculation, I'm wondering why are you calculating $\frac{d'}{g}$, I think that means $(\frac{1}{g}\frac{\partial{d_{pixel}}}{\partial{\alpha_k}})$? What you need shall be $\frac{\partial{d_{pixel}}}{\partial{\alpha_k}}$. And

$$ \frac{\partial{d_{pixel}}}{\partial{\alpha_k}}=\frac{\partial{\frac{f}{g}}}{\partial{\alpha_k}}=\frac{g\frac{\partial{f}}{\partial{\alpha_k}}-f\frac{\partial{g}}{\partial{\alpha_k}}}{g^2}=\frac{d_kT_k + \sum_{i=K+1}^n d_i\alpha_i \frac{-T_i}{1-\alpha_k}}{g} - (T_k + \sum_{i=K+1}^n \alpha_i \frac{-T_i}{1-\alpha_k})\frac{f}{g^2} $$

Which is actually your result divided by g... Anyway, have you tried your equation? Maybe the calculation does not need to be mathematically correct, it's more like designing the loss. I mean, As far as the gradient penalize the alpha for points far from $d_{pixel}$ and award the alpha for points close to $d_{pixel}$, it shall work.

Yes, your are correct. your final derivation looks correct to me.

In my derivation, the $\frac {d'} {g}$ is actually $\frac {\partial d_{pixel}} {\partial \alpha_k} *g$, because I use approximation $d'=\frac {(f'g -g'f)}{g^2} \propto (f'g -g'f)$

I haven't try the implementation in taichi yet as I'm not familiar with taichi. Would be appreciated if you can try it when available

Hello @wanmeihuali ,

while I'm going over the existing implementation, I have one question: The w_i in code , it seems to be
$$w_i = \sum_{j=1}^{i}{ {color_j \alpha_j T_j}}$$
rather than $$w_i = \sum_{j=i+1}^{n}{\frac{color_j \alpha_j T_j}{1-\alpha_i}}$$
as you mentioned (note the supscript is $i$ instead of $n$), Right? Unless it's inverse order?
You may ignore this one, if it's inverse order. What I did here is trying to figure out how to implement it in the forward-order. I note that

$$ \sum_{i=K+1}^{n} \alpha_i T_i = g_n-g_k $$

$$ \sum_{i=K+1}^{n} \alpha_i T_i d_i = f_n-f_k $$

where $f_k={\sum_{i=1}^{k}{d_{i} \alpha_{i} T_i}}, g={\sum_{i=1}^{k}{\alpha_{i} T_i}}$, This can help simplified the gradient to be

$$ \frac {\partial d_{pixel}} {\partial \alpha_k} = \frac{T_k(d_k-d_{pixel}) + \frac 1 {1-\alpha_k}[ (f_n-f_k) - (g_n-g_k)d_{pixel}]} {g_n} $$

That means we need to store pixel_accumulated_alpha ($g_n$) and accumulated_depth ($f_n$ ?) and rasterized_depth ($d_{pixel}$? } ) for backward, and compute the $f_k$ and $g_k$ in the for-loop.
Not sure whether it can help simplify the implementation 🤔

I can check that when I go back to home tonight. Anyway, backward is in reverse order(from n to 1)

Thank you so much!

Hello @wanmeihuali , I created a PR #126 for this, any feedbacks would be appreciated

Nice work @chenyuntc ! Just see your experiment, the findings seem very reasonable. If there are many points, the rasterizer will stop very early, and only the first several points met by the ray will be optimized. You can try more steps to see if more points get optimized. Also, during real training, points with small alpha will periodically be removed. So I think this issue won't affect results too much during real training.

The next step is to test the code on some more complex case, e.g. more complex object rather than a wall. Then finally test it on a real dataset, I'm not sure if geometry supervision works(unless you get very accurate depth measurements, e.g. from lidar), but I guess letting points converge to the depth estimated(the surface) can help improve training quality.

You can try more steps to see if more points get optimized.

Do you mean more training iterations? I tried for 100K iterations, but the loss did not change at all after 10K iterations.

I find it's extremely difficult to make it work without low-alpha initialization
For example, if I initial pts_feat[:, 7] = -4 instead of -5, the loss decrease in the beginning but then soon saturated. (it also depends on the number of points, with more points, the initial value need to be smaller)

Also, during real training, points with small alpha will periodically be removed.

I also tried removing pts with low-alpha, but still not worked.

I'm not sure if geometry supervision works(unless you get very accurate depth measurements, e.g. from lidar),

Based on my previous experience on NeRF, depth supervision (from lidar) is super important for the sparse observations in urban-driving scenes. I will let you know if I can make it work in the urban-driving scenes.

Maybe we can first try some easier solution. e.g.
given a list of points through which a ray passes, use some threshold to split the points into three groups: from camera to area close to the estimated/provided depth, points in the area close to the estimated/provided depth, and points deeper than estimated/provided depth. All we need to do is to penalize the first group, award the second group, and ignore the third group. The penalize/award amount shall also be proportional to the color weight($\alpha_i T_i$). Because we are using Gradient Desend. As long as the direction is correct, it shall converge to the estimated/provided depth, the gradient(optimization direction) does not need to be mathematically correct. @chenyuntc