MVSFormer

Codes of MVSFormer: Multi-View Stereo by Learning Robust Image Features and Temperature-based Depth (TMLR2023)

Releasing codes of training and testing
Adding dynamic pointcloud fusion for T&T
Releasing pre-trained models

Installation

git clone https://github.com/ewrfcas/MVSFormer.git
cd MVSFormer
pip install -r requirements.txt

We also highly recommend to install fusibile from (https://github.com/YoYo000/fusibile) for the depth fusion.

git clone https://github.com/YoYo000/fusibile.git
cd fusibile
cmake .
make

Tips: You should revise CUDA_NVCC_FLAGS in CMakeLists.txt according the gpu device you used. We set -gencode arch=compute_70,code=sm_70 instead of -gencode arch=compute_60,code=sm_60 with V100 GPUs. For other GPU types, you can follow

# 1080Ti
-gencode arch=compute_60,code=sm_60

# 2080Ti
-gencode arch=compute_75,code=sm_75

# 3090Ti
-gencode arch=compute_86,code=sm_86

# V100
-gencode arch=compute_70,code=sm_70

More compile relations could be found in here.

Datasets

DTU

Download preprocessed poses from DTU training data, and depth from Depths_raw.
We also need original rectified images from the official website.
DTU testing set can be downloaded from MVSNet.

dtu_training
 ├── Cameras
 ├── Depths
 ├── Depths_raw
 └── DTU_origin/Rectified (downloaded from the official website with origin image size)

BlendedMVS

Download high-resolution images from BlendedMVS

BlendedMVS_raw
 ├── 57f8d9bbe73f6760f10e916a
 .   └── 57f8d9bbe73f6760f10e916a
 .       └── 57f8d9bbe73f6760f10e916a
 .           ├── blended_images
             ├── cams
             └── rendered_depth_maps

Tank-and-Temples (T&T)

Download preprocessed T&T pre-processed by MVSNet. Note that users should use the short depth range of cameras, run the evaluation script to produce the point clouds. Remember to replace the cameras by those in short_range_caemeras_for_mvsnet.zip in the intermediate folder, which is available at short_range_caemeras_for_mvsnet.zip

tankandtemples
 ├── advanced
 │  ├── Auditorium
 │  ├── Ballroom
 │  ├── ...
 │  └── Temple
 └── intermediate
        ├── Family
        ├── Francis
        ├── ...
        ├── Train
        └── short_range_cameras

Training

Pretrained weights

DINO-small (https://github.com/facebookresearch/dino): Weight Link

Twins-small (https://github.com/Meituan-AutoML/Twins): Weight Link

Training MVSFormer (Twins-based) on DTU with 2 32GB V100 GPUs cost 2 days. We set the max epoch=15 in DTU, but it could achieve the best one in epoch=10 in our implementation. You are free to adjust the max epoch, but the learning rate decay may be influenced.

CUDA_VISIBLE_DEVICES=0,1 python train.py --config configs/config_mvsformer.json \
                                         --exp_name MVSFormer \
                                         --data_path ${YOUR_DTU_PATH} \
                                         --DDP

MVSFormer-P (frozen DINO-based).

                                         
CUDA_VISIBLE_DEVICES=0,1 python train.py --config configs/config_mvsformer-p.json \
                                         --exp_name MVSFormer-p \
                                         --data_path ${YOUR_DTU_PATH} \
                                         --DDP

We should finetune our model based on BlendedMVS before the testing on T&T.

CUDA_VISIBLE_DEVICES=0,1 python train.py --config configs/config_mvsformer_blendmvs.json \
                                         --exp_name MVSFormer-blendedmvs \
                                         --data_path ${YOUR_BLENDEMVS_PATH} \
                                         --dtu_model_path ${YOUR_DTU_MODEL_PATH} \
                                         --DDP

Test

Pretrained models: OneDrive

For testing on DTU:

CUDA_VISIBLE_DEVICES=0 python test.py --dataset dtu --batch_size 1 \
                                       --testpath ${dtu_test_path} \
                                       --testlist ./lists/dtu/test.txt \
                                       --resume ${MODEL_WEIGHT_PATH} \
                                       --outdir ${OUTPUT_DIR} \
                                       --fusibile_exe_path ./fusibile/fusibile \
                                       --interval_scale 1.06 --num_view 5 \
                                       --numdepth 192 --max_h 1152 --max_w 1536 --filter_method gipuma \
                                       --disp_threshold 0.1 --num_consistent 2 \
                                       --prob_threshold 0.5,0.5,0.5,0.5 \
                                       --combine_conf --tmps 5.0,5.0,5.0,1.0

For testing on T&T, T&T uses dpcd, whose confidence is controled by conf rather than prob_threshold. Sorry for the confused parameter names, which is the black history of this project. Note that we recommend to use num_view=20 here, but you should build a new pair.txt with 20 views as MVSNet.

CUDA_VISIBLE_DEVICES=0 python test.py --dataset tt --batch_size 1 \
                                      --testpath ${tt_test_path}/intermediate(or advanced) \
                                      --testlist ./lists/tanksandtemples/intermediate.txt(or advanced.txt)
                                      --resume ${MODEL_WEIGHT_PATH} \
                                      --outdir ${OUTPUT_DIR} \ 
                                      --interval_scale 1.0 --num_view 10 --numdepth 256 \
                                      --max_h 1088 --max_w 1920 --filter_method dpcd \
                                      --prob_threshold 0.5,0.5,0.5,0.5 \
                                      --use_short_range --combine_conf --tmps 5.0,5.0,5.0,1.0

Cite

If you found our project helpful, please consider citing:

@article{caomvsformer,
  title={MVSFormer: Multi-View Stereo by Learning Robust Image Features and Temperature-based Depth},
  author={Cao, Chenjie and Ren, Xinlin and Fu, Yanwei},
  journal={Transactions of Machine Learning Research},
  year={2023}
}

Our codes are partially based on CDS-MVSNet, DINO, and Twins.