doubleZ0108/GeoMVSNet

Custom dataset

Closed this issue · 2 comments

Hello,

I'm interested in running GeoMVSNet on my own dataset but am a bit unclear on how to proceed. I've read the README.md, but some parts are not entirely clear to me. My custom dataset doesn't include ground truth; it's purely for testing and consists of both indoor and outdoor scenes, covering roughly a city block's worth of distance.

You mentioned in section 3.3 (Custom Data) of the README that:

GeoMVSNet can reconstruct on custom data. At present, you can refer to MVSNet to organize your data, and refer to the same steps as above for depth estimation and point cloud fusion.

I understand that I need to organize my data according to MVSNet. However, could you please clarify what you mean by "the same steps as above"? Are you referring to the steps outlined in section 3.2 (Tanks and Temples)?

I'm currently unsure about how to apply your code to my dataset, so a detailed guide on the necessary steps would be greatly appreciated.

Thank you.

Hi, sorry for marking that part as @TODO in README since I am busy these days.

I can give you a general process here:

  1. Pre-process of your images. (if you captured video, cut frames then)
  2. Use Colmap to estimate camera pose. (Here is an instruction in Chinese )
  3. Use this colmap2mvsnet script to transfer the camera pose from Colmap format to MVSNet series format.

Organize the image, camera parameters, and depth range into the style of the public dataset we provide.

  1. Modify the test script (test_dtu.sh & fusion_dtu.sh).
  2. Use our pre-trained model for estimating and fusion.

I hope you can apply your own dataset through my hints.

After that, when I have time, I will record a video to show how to construct custom datasets.

Hi @vietpho.

Firstly, I have to congratulate that the results of the point cloud you presented represent that you have successfully run our model. And honestly, I think the result is acceptable since the checkpoint we released is trained on the DTU dataset which mainly contains small objects.

However, the data you faced is a large room (open, with glasses, light change, etc.). Maybe you should try finetuning our pre-trained model on large-scale datasets like BlendedMVS or other datasets (7-scenes?).

There are several other reasons for the above results, including:

  1. The camera parameters obtained by Colamp can not guarantee the accuracy.
  2. The video shot may not have controlled exposure, etc., and the frames are blurred.
  3. The depth range assumption is not accurate. This is why depth map visualization failed. Try to specify the v_min and v_max of visualization.

I don't think using the model we provided directly is suitable for your use case. You can try to finetune our base model or use other methods (NeuralRecon and SimpleRecon are used to reconstruct rooms).