This implementation has slightly different outputs than the original code

Question

This implementation has slightly different outputs than the original code

kwea123 opened this issue a year ago · 2 comments

Some other issues #51 #43 pointed out performance (psnr) and count difference w.r.t. the original repo. Since this repo has the same cuda interface for forward/backward, I tried to input the same tensors to both implementations and inspected the outputs (no training involved, just one forward call and one backward call).

It turns out that there is some numerical difference (a relative error of ~0.01). I didn't spot any obvious code-wise difference, I suspect it might be due to the intrinsics like __fmaf I referred to in another issue #36 , but I'm really not sure.

As a result, for people who want to use this repo to match official repo, there is need to manually tune the thresholds like gradient magnitude, etc. Otherwise you could end up with smaller count (e.g. #43 ) that results in inferior quality.

As this is only an implementation difference, not a code bug, I'm leaving here just for reference.

Answer 1 · 2024-01-02T15:58:38.000Z

I think the differences are mostly due to differences in the pytorch/libtorch interface (libtorch is less powerful and does support way less operations). And I suspect there are some implementation diffs within these two (lol, look at the optimizer state concatenations). You can plug in the original rasterizer and the difference will persist (and vice versa).

Keep in mind that you cannot compare anything that was processed by the backward path as the results are non-deterministic. So you will always see numeric differences in the computations even if you run the same input through the backward pass twice.

Answer 2 · 2024-01-03T03:31:33.000Z

differences in the pytorch/libtorch

no, I don't use libtorch at all, what I did is I replace the original cuda_rasterizer directory with your cuda_rasterizer dir, and build it using the original repo's setup. Then I run one pass with forward and one pass with backward with the same tensors by calling the C interface (e.g. https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/59f5f77e3ddbac3ed9db93ec2cfe99ed6c5d121d/diff_gaussian_rasterization/__init__.py#L86). There is no training involved, so no optimizer state or any other thing.

results are non-deterministic

yes it is not deterministic, but such a big relative error still suggests there is some difference imo. Like if you run either implementation's backward twice, you get very little difference like ~1e-6