wanmeihuali/taichi_3d_gaussian_splatting

Tiles-like artifacts

nishida-naoki opened this issue ยท 7 comments

@wanmeihuali

I found tiles-like artifacts when visualizing some parquet files, as shown in the attached image.

After my short inspection, I found that gaussian_alpha of the following lines takes value of inf in these pixels, which leads to the occupation of whole one tile by one color.
https://github.com/wanmeihuali/taichi_3d_gaussian_splatting/blob/main/taichi_3d_gaussian_splatting/GaussianPointCloudRasterisation.py#L403-L407

Do you have any idea on this issue or any suggestion for further inspection?

To reproduce the issue, please download the attached parquet file and run python3 visualizer.py --parquet_path_list refined.parquet .

By the way, thank you for sharing your great project! ๐Ÿ˜„

parquet.zip

Screenshot from 2023-09-05 11-36-17

This seems to be a numerical stability issue. Perhaps small values have occurred in some places, leading to problems like division by zero.

Strategy One: One approach is to directly remove these points from the parquet file. Using pandas should make this process straightforward. A more elegant method would be to eliminate these points in the parquet-saving logic. After all, these points, whenever they appear, will affect the inference results, so we shall always remove them if they appear.

Strategy Two: Directly remove points with nan/inf values during training. The advantage of this approach is that it can mitigate the impact of these inf points by training. The current program handles the nan situation here:

nan_mask = torch.isnan(pointcloud_features).any(dim=1)
, where it periodically removes points with a nan alpha value. However, it seems I may have forgotten to consider the inf scenario. Could you try using torch.isinf and employ a method similar to the code referenced above to remove points with inf values?

Strategy Three: Locate areas where division by zero might be occurring. The calculation for alpha grad is here:

alpha_grad_from_rgb = (color * T_i - w_i / (1. - alpha)) \
. I just revisited it and couldn't immediately identify places where division by zero might happen. The adapter controller doesn't modify alpha either. we may need to do breakpoint debugging to locate this bug... I'll try to look into it tomorrow or when I have time.

Thank you for your quick reply!

I tried Strategy One, but it did not work well. I eliminated suspected rows, e. g. inf, nan, too small value and too large value, but artifacts remained. I found that features in normal range are one of sources of these artifacts, while features with nan or inf can also cause the same kind of artifacts.

I am now trying Strategy Two as you suggested. My workaround is as follows:

It seems that simple addition of torch.isinf does not work well, because features in normal range can cause artifacts.

The exact line which makes alpha values inf seems to be the following line. The reason of inf is the argument of ti.exp(), which is determined by the combination of two variables conic and xy_mean. It seems that both of them cannot be a single indicator of the artifacts (e. g. setting threshold to conic did not work well).

exponent = -0.5 * (xy_mean.x * xy_mean.x * conic.x + xy_mean.y * xy_mean.y * conic.z) \

Hi @nishida-naoki, how do you set the threshold to conic? I just noticed that the div by zero can happen when computing the conic:


Have you tried to prevent det_cov from being zero? e.g. if (ti.math.abs(det_cov) < 1e-5) det_cov = ti.math.sign(x) * 1e-5.

jb-ye commented

@wanmeihuali I encounter the same issue. It actually happens quite frequently based on a few examples I tried. I tried to limit det_cov, didn't work for me ...

@nishida-naoki Do you have a fix for your example?

@jb-ye Hi,
For better visualization, insert two lines to here was enough for me:

+                if abs(gaussian_alpha) >= np.inf:
+                    continue

But filtering invalid gaussians out while training seems not to be so simple, because abs(gaussian_alpha) >= np.inf does not always mean invalid inv_cov or other intermediate variables.

jb-ye commented

We can close this issue as the root cause is fixed in the latest PR #153

Thanks a lot for your help! @jb-ye @wanmeihuali