Why Sparse depth comes from MVS instead of SfM in Code?
DingYikang opened this issue · 3 comments
Hi,
I found you used COLMAP's MVS pipline in your code to gennerate fused Point Cloud and then get the depth map. But you said you use COLMAP's SfM in your paper (the sparse depth in Fig.2 is also from the MVS instead of SfM). The sparse depth from SfM is actually very sparse. If you use COLMAP's MVS to reconsrtuct the point cloud, the depth from point cloud is already accurate (though a little sparse). So it confused me a lot. Why do you use nerf to generate depth? The Tab. 3 shows the depth from DepthNet is very accurate, the Nerf helps a little. I guess if you use a deeper DepthNet and a better loss you can get a better depth.
Actually, we follow the syntax convention of free view synthesiswe and have said that we acquire per-view sparse depth maps by
projecting the fused 3D point clouds after multi-view stereo. The MVS depth have many noise (See the COLMAP in Table 1) and we need fusion to get sparse but clean depths. To fairly compare, we use the same pretrained depth network with CVD and the 0.16 gap on Abs Rel metric is not marginal.
(1) However, you claimed more than three times that you use SfM' results to train DepthNet in your paper (especially in abstact). How do you explain this problem? I think you can easily distinguish them. But why you said you use SfM? This is very misleading. If you did it on purpose, it will be a big problem.
Using MVS or SfM has a big difference. You said you are handling MVS task, but you run the COLMAP's MVS pipline at first, the following steps (your main contributions) can just be regarded as refinement, right? If you indeed use SfM's sparse depth, this paper will be a nice work, but you didn't.
(2) You only use COLMAP's sparse depth as supervision (bad accuracy in Tab.1) to train DepthNet, but you get much more accurate depth priors from DepthNet(see in Tab. 3). I don't think it makes sense. How did you make it?
(1) However, we think it is also not suitable to substitute "SfM reconstruction" with "MVS reconstruction" because we only use sparse depths after fusion. A alternative one may be the "traditional 3D reconstruction". Thanks for your suggestion and we will update our arcticle.
(2) When evaluating COLMAP, we use all MVS depths. The results indicate that most of MVS depths are noisy and only the sparse depths after fusion are accurate. The traditional MVS methods cannot handle indoor scenes well and we will get many wrong depths. However, our method can tackle the issue, which is not just a refinement.