Codes of MVSFormer: Multi-View Stereo by Learning Robust Image Features and Temperature-based Depth (TMLR2023)
- Releasing codes of training and testing
- Adding dynamic pointcloud fusion for T&T
- Releasing pre-trained models
git clone https://github.com/ewrfcas/MVSFormer.git
cd MVSFormer
pip install -r requirements.txt
We also highly recommend to install fusibile from (https://github.com/YoYo000/fusibile) for the depth fusion.
git clone https://github.com/YoYo000/fusibile.git
cd fusibile
cmake .
make
Tips: You should revise CUDA_NVCC_FLAGS in CMakeLists.txt according the gpu device you used.
We set -gencode arch=compute_70,code=sm_70
instead of -gencode arch=compute_60,code=sm_60
with V100 GPUs.
For other GPU types, you can follow
# 1080Ti
-gencode arch=compute_60,code=sm_60
# 2080Ti
-gencode arch=compute_75,code=sm_75
# 3090Ti
-gencode arch=compute_86,code=sm_86
# V100
-gencode arch=compute_70,code=sm_70
More compile relations could be found in here.
- Download preprocessed poses from DTU training data, and depth from Depths_raw.
- We also need original rectified images from the official website.
- DTU testing set can be downloaded from MVSNet.
dtu_training
├── Cameras
├── Depths
├── Depths_raw
└── DTU_origin/Rectified (downloaded from the official website with origin image size)
Download high-resolution images from BlendedMVS
BlendedMVS_raw
├── 57f8d9bbe73f6760f10e916a
. └── 57f8d9bbe73f6760f10e916a
. └── 57f8d9bbe73f6760f10e916a
. ├── blended_images
├── cams
└── rendered_depth_maps
Download preprocessed T&T pre-processed by MVSNet.
Note that users should use the short depth range of cameras, run the evaluation script to produce the point clouds.
Remember to replace the cameras by those in short_range_caemeras_for_mvsnet.zip
in the intermediate
folder, which is available at short_range_caemeras_for_mvsnet.zip
tankandtemples
├── advanced
│ ├── Auditorium
│ ├── Ballroom
│ ├── ...
│ └── Temple
└── intermediate
├── Family
├── Francis
├── ...
├── Train
└── short_range_cameras
DINO-small (https://github.com/facebookresearch/dino): Weight Link
Twins-small (https://github.com/Meituan-AutoML/Twins): Weight Link
Training MVSFormer (Twins-based) on DTU with 2 32GB V100 GPUs cost 2 days. We set the max epoch=15 in DTU, but it could achieve the best one in epoch=10 in our implementation. You are free to adjust the max epoch, but the learning rate decay may be influenced.
CUDA_VISIBLE_DEVICES=0,1 python train.py --config configs/config_mvsformer.json \
--exp_name MVSFormer \
--data_path ${YOUR_DTU_PATH} \
--DDP
MVSFormer-P (frozen DINO-based).
CUDA_VISIBLE_DEVICES=0,1 python train.py --config configs/config_mvsformer-p.json \
--exp_name MVSFormer-p \
--data_path ${YOUR_DTU_PATH} \
--DDP
We should finetune our model based on BlendedMVS before the testing on T&T.
CUDA_VISIBLE_DEVICES=0,1 python train.py --config configs/config_mvsformer_blendmvs.json \
--exp_name MVSFormer-blendedmvs \
--data_path ${YOUR_BLENDEMVS_PATH} \
--dtu_model_path ${YOUR_DTU_MODEL_PATH} \
--DDP
Pretrained models: OneDrive
For testing on DTU:
CUDA_VISIBLE_DEVICES=0 python test.py --dataset dtu --batch_size 1 \
--testpath ${dtu_test_path} \
--testlist ./lists/dtu/test.txt \
--resume ${MODEL_WEIGHT_PATH} \
--outdir ${OUTPUT_DIR} \
--fusibile_exe_path ./fusibile/fusibile \
--interval_scale 1.06 --num_view 5 \
--numdepth 192 --max_h 1152 --max_w 1536 --filter_method gipuma \
--disp_threshold 0.1 --num_consistent 2 \
--prob_threshold 0.5,0.5,0.5,0.5 \
--combine_conf --tmps 5.0,5.0,5.0,1.0
For testing on T&T, T&T uses dpcd, whose confidence is controled by conf
rather than prob_threshold
.
Sorry for the confused parameter names, which is the black history of this project.
Note that we recommend to use num_view=20
here, but you should build a new pair.txt with 20 views as MVSNet.
CUDA_VISIBLE_DEVICES=0 python test.py --dataset tt --batch_size 1 \
--testpath ${tt_test_path}/intermediate(or advanced) \
--testlist ./lists/tanksandtemples/intermediate.txt(or advanced.txt)
--resume ${MODEL_WEIGHT_PATH} \
--outdir ${OUTPUT_DIR} \
--interval_scale 1.0 --num_view 10 --numdepth 256 \
--max_h 1088 --max_w 1920 --filter_method dpcd \
--prob_threshold 0.5,0.5,0.5,0.5 \
--use_short_range --combine_conf --tmps 5.0,5.0,5.0,1.0
If you found our project helpful, please consider citing:
@article{caomvsformer,
title={MVSFormer: Multi-View Stereo by Learning Robust Image Features and Temperature-based Depth},
author={Cao, Chenjie and Ren, Xinlin and Fu, Yanwei},
journal={Transactions of Machine Learning Research},
year={2023}
}
Our codes are partially based on CDS-MVSNet, DINO, and Twins.