Tanks and Temples Setup
tejaskhot opened this issue · 18 comments
Thanks for sharing the code.
I am trying to reproduce the results on tanks and temples with the pre-trained model but not succeeding so far. An example camera file looks like:
extrinsic
0.333487 -0.0576322 -0.940992 -0.0320506
0.0582181 -0.994966 0.0815704 -0.0245921
-0.940956 -0.0819853 -0.328452 0.248608
0.0 0.0 0.0 1.0
intrinsic
1165.71 0 962.81
0 1165.71 541.723
0 0 1
0.193887 0.00406869 778 3.35933
I have parsed the file to adjust the depth min and max, but it doesn't seem to help much. I only have a 12GB GPU memory, so I am running at half the image resolution which shouldn't hurt a lot. However, the outputs I am getting are pretty bad and nothing like the paper. Moreover, I find that I am having to change the parameters for every single scan (horse, family, etc) separately and no set of values seems to apply to all.
@YoYo000 Since there are multiple similar questions on this, it'd be great if you could please summarize the detailed steps for reproducing the scan results including the parameters to use and changes to the repository scripts if any.
I will generate the original cam.txt files for Tanks and Temples dataset soon. The depth min and max of the above camera might be a little bit too relaxed.
Meanwhile if you are using the resized images, some post-proc params might need to be further tuned. I will try to find a proper config and provide a simple script.
Thanks a lot @YoYo000 !
Do you have an approx timeline for this? By when do you think it'll be possible to share?
@tejaskhot Hopefully today or tomorrow
@tejaskhot you could use this cams and try the new commit for Family dataset:
python test.py --dense_folder '/data/intel_dataset/intermediate/mvsnet_input/family/' --max_d 256 --max_w 960 --max_h 512 --interval_scale 1
python depthfusion.py --dense_folder '/data/intel_dataset/intermediate/mvsnet_input/family' --prob_threshold 0.6 --disp_thresh 0.25 --num_consistent 4
If you want to tune the point cloud I think you could change --prob_threshold. Also, I found that the downsized setting and the Fusibile post-processing would affect the reconstruction:
Downsized image + Fusibile post-proc:
Downsized image + Proposed post-proc:
Thanks for the quick response. I have two followup questions.
- I generated outputs using the steps you mentioned with downsized images (same values you mention above) and got an output for
family
which I think is similar to what you posted. However, a zoomed out view of it shows plenty of surrounding areas being reconstructed as shown. Is this normal/expected?
- Using the same hyperparameters, I produced outputs for a few of the other scans and they don't look as expected. For eg, here are two views of
horse
.
Does this mean we have to set hyperparameters for every scan of Tanks and Temples individually? Are the results in your paper produced with such individually picked values or do you use the same set of values across the dataset?
Could you check again the input cameras and other settings? The new camera should have the tight depth range but your reconstruction are with the wide depth range. The zoomed out views of my reconstructions with the provided two commands and parameters look like:
Also, I use the rectified images I sent to you and donot pre-resize images. The test.py script will automatically do the resizing and cropping.
For hyperparameters on DTU evaluation, we use parameters as described in the paper. For Tanks and Temples, we fixed all parameters except the probability threshold (0.6 +- 0.2). This is because some of the scenes contain large portion of background areas and skies. Tuning the probability threshold could effectively control the filtering for these parts.
I tried downloading the files you posted, freshly cloning the repo and running the commands as is but I get all depth value predictions as nan and consequently empty point cloud. Can you please verify the files you linked? There seems to be some issue.
@tejaskhot I see I gave your the wrong link... here is the new cams
Sorry for my mistake!
Thanks! These files work and I am able to reproduce the results. I had one question regarding the full-resolution results. As reported in the paper, I tried using images of size 1920 x 1056 with D=256, interval=1, N=5 on a 16GB GPU but that for me also complains for going out of memory. How are you able to run these inferences at full resolution? Is there something I am missing?
Is the GPU also occupied by other applications (e.g., Chrome) during your experiments? I have encountered the OOM problem when only ~200 MB memory is not available. BTW my experiments were ran on the google cloud ml platform with the P100 GPU.
Thanks!
@tejaskhot Hi, why Yao's results have the tight depth range but your reconstruction is with the wide depth range when roomed out?
Is that because of the range of depth listed in the last lines of cam.txt ?
@whubaichuan I don't remember the specifics to be honest but that seems to be a fair guess. As @YoYo000 pointed out, the cam parameters and depth range are crucial for getting good results.
@tejaskhot Thanks for reply. Have you tested the different settings in T&T Leaderboard? Is the pro_threshold main cause to influence the results in T&T ?
@YoYo000 I am sorry to annoy you in this busy time, but I'd like to ask you how I can reproduce the results for Tanks and Temples dataset in R-MVSNet paper.
I think the above images and the results in MVSNet paper is made with the camera parameters in short_range_cameras_for_mvsnet
that you provided in this repo. However, this short range camera parameters is not provided for advanced dataset (although it's natural because the scene in the advanced sets might not fit in short ranges).
So, I though this means that the results in R-MVSNet paper is made by the camera parameters NOT in short_range_cameras_for_mvsnet
, namely stored in cams
sub-folder in the folders with scene names such as Auditorium
. However, as far as I tested, the reconstruction quality using these camera parameters were significantly lower than those I could see in the R-MVSNet paper.
So, I am wondering if you would shared the tips for tuning the depth range for R-MVSNet paper. Thank you very much for your help.
Hi @tatsy
Yes, you are right, the R-MVSNet paper is using the long range camera for reconstruction, for both the intermediate set and the advanced set. Only MVSNet uses the short_range_cameras_for_mvsnet
as it is restricted to a small depth number.
For benchmarking on the advanced dataset, the post-processing would be important. From what I observed the Fusibile point cloud is quite noisy, and I was using the fusion + refinement strategy described in the paper to get the benchmarking result.
Hi @YoYo000
Thank you very much for your reply. So, I guess the problem is variational depth refinement in the R-MVSNet paper because the results I got were rather sparse as well as they are noisy. Actually, the fusibile works pretty well for DTU dataset with MVSNet (not R-MVSNet), and moreover, the behavior of the R-MVSNet is quite similar to those in the second row of Table 3 (R-MVSNet paper, for refinement ablated one).
I have already implemented the program for the variational depth refinement but it is quite unstable in the gradient descent process. As I posted in another issue, I am wondering how each of ZNCC and bilateral smoothing terms is.
#35 (comment)
Concretely, my questions are:
- ZNCC is used as the data term as it is? It is not used like
exp(-ZNCC)
or others? - Weights for the data term (by ZNCC) and bilateral smoothing term are both 1? Or, they are different and, e.g., the weight for the data term is 1 and that for the smoothing term is 0.01 or something like that?
- Neighbors for the bilateral smoothing term (
N(p_1)
in the paper) is 4-neighbor pixels, 8-neighbor pixels, or larger?
Thank you very much for your help.