-
Releasing codes of training and testing
-
Releasing pre-trained models trained on DTU
-
Releasing pre-trained models fine-tuned on Tanks&Temples and test code.
-
Releasing point cloud of DTU
Please first see FlashAttention2 for original requirements and compilation instructions.
git clone https://github.com/ewrfcas/MVSFormer.git
cd MVSFormer
pip install -r requirements.txt
We also highly recommend to install fusibile from (https://github.com/YoYo000/fusibile) for the depth fusion.
git clone https://github.com/YoYo000/fusibile.git
cd fusibile
cmake .
make
Tips: You should revise CUDA_NVCC_FLAGS in CMakeLists.txt according the gpu device you used. We set -gencode arch=compute_70,code=sm_70
instead of -gencode arch=compute_60,code=sm_60
with V100 GPUs. For other GPU types, you can follow
# 1080Ti
-gencode arch=compute_60,code=sm_60
# 2080Ti
-gencode arch=compute_75,code=sm_75
# 3090Ti
-gencode arch=compute_86,code=sm_86
# V100
-gencode arch=compute_70,code=sm_70
More compile relations could be found in here
Please refer to MVSFormer
Training MVSFormer++ on DTU with 4 48GB A6000 GPUs costs around 1 day. We set the max epoch=15 in DTU.
CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --config config/mvsformer++.json \
--exp_name MVSFormer++ \
--DDP
We should finetune our model based on BlendedMVS before the testing on Tanks&Temples.
CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --config ./saved/models/DINOv2/mvsformer++/mvsformer++_ft.json \
--exp_name MVSFormer++_blendedmvs_dtu_mixed_M0 \
--dataloader_type "BlendedLoader" \
--data_path ${YOUR_BLENDEMVS_PATH}
--dtu_model_path ./saved/models/DINOv2/mvsformer++/model_best.pth \
--finetune \
--balanced_training \
--DDP
Pretrained models and additional pair.txt : OneDrive
For testing on DTU:
CUDA_VISIBLE_DEVICES=0 python test.py --dataset dtu --batch_size 1 \
--testpath ${dtu_test_path} --testlist ./lists/dtu/test.txt \
--resume ${MODEL_WEIGHT_PATH} \
--outdir ${OUTPUT_DIR} --interval_scale 1.06 --num_view 5 \
--numdepth 192 --max_h 1152 --max_w 1536 --filter_method gipuma \
--disp_threshold 0.1 --num_consistent 2 --prob_threshold 0.5
Our MVSFormer++ requires camera parameters and view selection file. If you do not have them, you can use Colmap
to estimate cameras and convert them to MVSNet format by colmap2mvsnet.py
. Please arrange your files as follows.
- <dense_folder>
- images_col # input images of Colmap
- sparse_col # SfM output from colmap in .txt format
- cams # output MVSNet cameras, to be generated
- images # output MVSNet input images, to be generated
- pair.txt # output view selection file, to be generated
An example of running Colmap
colmap feature_extractor \
--database_path <dense_folder>/database.db \
--image_path <dense_folder>/images_col
colmap exhaustive_matcher \
--database_path <dense_folder>/database.db
colmap mapper \
--database_path <dense_folder>/database.db \
--image_path <dense_folder>/images_col \
--output_path <dense_folder>/sparse_col
colmap model_converter \
--input_path <dense_folder>/sparse_col/0 \
--output_path <dense_folder>/sparse_col \
--output_type TXT
Run colmap2mvsnet.py
by
python colmap2mvsnet.py --dense_folder <dense_folder> --max_d 256 --convert_format
Please note that: the resolution of input images must be divisible by 64. we can change the parameter of max_h
and max_w
. For test on your own dataset:
CUDA_VISIBLE_DEVICES=0 python test.py --dataset dtu --batch_size 1 \
--testpath ${scene_test_path} --testlist ${scene_test_list} \
--resume ${MODEL_WEIGHT_PATH} \
--outdir ${OUTPUT_DIR} --interval_scale 1.06 --num_view 5 \
--numdepth 192 --max_h ${max_h} --max_w ${max_w} --filter_method dpcd \
--conf 0.5
If you found our program helpful, please consider citing:
@inproceedings{cao2024mvsformer++,
title={MVSFormer++: Revealing the Devil in Transformer's Details for Multi-View Stereo},
author={Chenjie Cao, Xinlin Ren and Yanwei Fu},
booktitle={International Conference on Learning Representations (ICLR)},
year={2024}
}
We borrow the code from VisMVSNet, MVSFormer. We express gratitude for these works' contributions!