turandai/gaussian_surfels

Inconsistent Chamfer Distances between subsequent trainings

Closed this issue · 4 comments

Hi, thanks for the great work.

I'm trying to reproduce the results in the paper for Chamfer Distance metric for DTU dataset. But there is randomness in the training, thus inconsistency between chamfer distances for subsequent trainings of the same scene. Manual seeds were properly set for both training and rendering.

Could you please explain which procedure have you followed to report numerical results in the paper?

Hi, we didn't deal with the randomness, we ran the code for multiple rounds and reported the best among them in the paper.

Hi, I checked the code and there were few places changed comapring to the orignal paper version:

  1. the test set selection in the paper follow NeuS here, please use L450-453 instead of L449 on DTU,
  2. the grad threshold here is multiplied by 0.4 instead of 0.5,
  3. the weight of mask loss here is 0.1 instead of 1.
  4. the mesh smoothing here is neglected.

I just ran for another round with modified config, the overall results is overall consistent with the paper despite the randomness:
DTU CD PSNR
scan24, 0.67, 29.83
scan37, 0.94, 26.94
scan40, 0.54, 30.08
scan55, 0.45, 32.31
scan63, 0.98, 34.22
scan65, 1.09, 32.22
scan69, 0.93, 29.94
scan83, 1.23, 36.84
scan97, 1.19, 28.62
scan105, 0.78, 33.73
scan106, 0.84, 34.96
scan110, 1.84, 32.38
scan114, 0.52, 31.47
scan118, 0.66, 37.76
scan122, 0.60, 38.04
mean, 0.89, 32.63

Thank you very much for follow-up comment!

I have a couple of follow-up questions.
You write that the results are consistent, do you mean that your work always outperforms some prior works and underperforms compared to others? Or what do you mean by that?
Additionally, I notice that the results in the comment differ from the paper by up to 0.26 in terms of CD (scan110). Is this due to such randomness we discussed? Could you elaborate more on this?

Thank you!

Hi, the overall results are generally around 0.88~0.92 due to the randomness, no methods involved in our comparison lies in this range, so I think we can say our method under/outperform others consistently and stablly.
For the special case of scan110, there are other modifications might lead to this difference:
1, We linear decay the weight of normal prior loss in case of the normal prior is not correct on more general scenes
2, We adjust the cutting threshold from from original 1e-5 to 1e-4 to prevent holes, also considering the performance on more general scenes
After reversing these two, I got a chamfer distance of 1.61 on scan110 that is closer to the reported 1.58. And for my understanding, there might be more randomness in cases with strong reflections like scan110/97 on which our algorithm usually fail to converge. Except the randomness, the other reason of the difference might be different parameter settings. I am sorry that we lost track of the exact original version of parameter settings after a lot of parameter tuning in many places. The version of code now I am trying to reverse as mentioned above produces an overall better score than the reported, but still exist differences on some of the scenes:
scan24 , 0.655, 29.97
scan37 , 0.878, 26.98
scan40 , 0.582, 29.90
scan55 , 0.433, 32.33
scan63 , 0.881, 34.32
scan65 , 1.093, 32.34
scan69 , 0.900, 29.89
scan83 , 1.204, 36.80
scan97 , 1.285, 28.60
scan105, 0.763, 33.78
scan106, 0.767, 34.96
scan110, 1.609, 32.57
scan114, 0.506, 31.49
scan118, 0.646, 37.81
scan122, 0.606, 38.01
mean, 0.855 32.65