fuxiao0719/GeoWizard

About the results

Closed this issue · 4 comments

Thanks for sharing this excellent work!
I noticed that the image resolution used in your paper (576x768) differs from that in Marigold (480x640). Is this comparison fair? The results from the higher resolution in your paper are directly compared with those in Marigold.
image

Thanks for the question! Actually, Marigold is trained on two hybrid resolutions, 480×640 on Hypersim and KITTI benchmark resolution (1216 × 352) on Virtual KITTI (see '4.2. Evaluation' and 'A.1. Mixed Dataset Training'). Besides, I thank resolution is not a key factor. Rather, it is the mutual guidance provided by normal constraint in the latent space (see Ablation Study).

Thanks for the question! Actually, Marigold is trained on two hybrid resolutions, 480×640 on Hypersim and KITTI benchmark resolution (1216 × 352) on Virtual KITTI (see '4.2. Evaluation' and 'A.1. Mixed Dataset Training'). Besides, I thank resolution is not a key factor. Rather, it is the mutual guidance provided by normal constraint in the latent space (see Ablation Study).

How about the resolution during inference? Did you resize the testing images from different evaluation datasets to a fixed resolution?

I find this to be a very interesting phenomenon.
Can I interpret this phenomenon to mean that the mixed dataset training strategy might benefit the model by helping it adapt to different evaluation datasets containing various resolutions? If the resolution during evaluation matches the training resolution, the performance could be better than with this mixed strategy.

In our evaluation, we didn't change the aspect ratio as we found that the diffusion model is highly adaptable to various resolutions. Indeed, it might be an interesting phenomenon. But personally speaking, I don't think it is a proper strategy when scaling up to more diverse datasets with a wider range of image resolutions. This may affect the generalization ability of a single model. We find that Marigold can't deal with the sky very well (see comparison on Project Page), maybe it is due to this mixed strategy. But I'm not sure if it really is.

In our evaluation, we didn't change the aspect ratio as we found that the diffusion model is highly adaptable to various resolutions. Indeed, it might be an interesting phenomenon. But personally speaking, I don't think it is a proper strategy when scaling up to more diverse datasets with a wider range of image resolutions. This may affect the generalization ability of a single model. We find that Marigold can't deal with the sky very well (see comparison on Project Page), maybe it is due to this mixed strategy. But I'm not sure if it really is.

Yes, I agree. Perhaps there could be a better strategy to address the resolution issue.
Thanks for your reply!