nianticlabs/diffusionerf

Which LPIPS model is used in the paper ?

anonymous-pusher opened this issue · 6 comments

Hello,
thank you for sharing the code of your great work.
I noticed that in the paper, the comparison with other sota methods shows inconsistencies between lpips and other metrics where your method is better at lpips but not necessarily on the others, especially on the extreme setting of 3 views in DTU dataset.
I see in your code that you load both lpips_alex and lpips_vgg models for evaluation, but it is lpips_alex that is being used, so I wonder if it was used for the comparison in the paper as well.
It is worth noting that the methods that are being compared to, such as PixelNerf and RegNerf use lpips vgg for their evaluation.

Hi, yeah I have the same question. Are you using alex or vgg lpips in your paper ? Thank you in advance. Awaiting you reply.

Hi there, thank you for bringing this to our attention - we used lpips_alex in our evaluations, so that may give rise to a small discrepancy w.r.t. the baselines if they used lpips_vgg. I just had a quick look at the Regnerf paper again and couldn't find where it's specified which version of lpips was used, but perhaps I missed something or you found it in the source code? I also checked Pixelnerf and they do say they used vgg, so apologies for the oversight.

Best,
Jamie

Hello, thank you for your answer.
You are right in that RegNerf does not specify the version of lpips and in their jax code, it is not included for some reason:
https://github.com/google-research/google-research/blob/15134e6b8c4caf1c326b2ae3e92bd6fbb50d6d18/regnerf/eval.py#L79C15-L79C15
However, since they compared with PixelNerf, and evaluate on PixelNerf's test data: "We adhere to the protocol of Yu et al. [62] and evaluate on their reported test set of 15 scenes.", it should be assumed that they also used vgg lpips for their comparison.
Moreover, another related work, Freenerf , reproduced the work of RegNerf (with slight change to the input) and achieved similar lpips results :
image

while using vgg for the evaluation as can be seen in their code:
https://github.com/Jiawei-Yang/FreeNeRF/blob/baedb01ac800a8f242ab246c674c6c4aa2d9b974/eval.py#L40

I wonder if there will be plans to update the comparison numbers in the paper.

Thank you.

Thank you for this - in that case, we will repeat the LLFF and DTU evaluations using the VGG version of lpips and will update the numbers in the paper accordingly.

Best wishes,
Jamie

Hi @mJones00

Thank you very much for bringing this oversight to our attention!

We have now updated table 1 in the main paper to use LPIPS-VGG, and also added a note in the supplemental (section 5) clarifying which LPIPS version is used where in the paper. We also added your name to the acknowledgements for finding the issue. The new version of the paper is now on arxiv and we also added an update to the Readme.

Hi @daniyar-niantic and @jamiewynn ,
Thank you for your updates and for your transparency. I also appreciate the acknowledgement.