/ET-MVSNet

[ICCV 2023] When Epipolar Constraint Meets Non-local Operators in Multi-View Stereo

Primary LanguagePythonMIT LicenseMIT

ET-MVSNet

When Epipolar Constraint Meets Non-local Operators in Multi-View Stereo
Authors: Tianqi Liu, Xinyi Ye, Weiyue Zhao, Zhiyu Pan, Min Shi*, Zhiguo Cao
Institute: Huazhong University of Science and Technology
ICCV 2023

Abstract

Learning-based multi-view stereo (MVS) method heavily relies on feature matching, which requires distinctive and descriptive representations. An effective solution is to apply non-local feature aggregation, e.g., Transformer. Albeit useful, these techniques introduce heavy computation overheads for MVS. Each pixel densely attends to the whole image. In contrast, we propose to constrain non-local feature augmentation within a pair of lines: each point only attends the corresponding pair of epipolar lines. Our idea takes inspiration from the classic epipolar geometry, which shows that one point with different depth hypotheses will be projected to the epipolar line on the other view. This constraint reduces the 2D search space into the epipolar line in stereo matching. Similarly, this suggests that the matching of MVS is to distinguish a series of points lying on the same line. Inspired by this point-to-line search, we devise a line-to-point non-local augmentation strategy. We first devise an optimized searching algorithm to split the 2D feature maps into epipolar line pairs. Then, an Epipolar Transformer (ET) performs non-local feature augmentation among epipolar line pairs. We incorporate the ET into a learning-based MVS baseline, named ET-MVSNet. ET-MVSNet achieves state-of-the-art reconstruction performance on both the DTU and Tanks-and-Temples benchmark with high efficiency.

Installation

conda create -n etmvsnet python=3.10.8
conda activate etmvsnet
pip install -r requirements.txt
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 -f https://download.pytorch.org/whl/torch_stable.html

Data Preparation

1. DTU Dataset

Training data. We use the same DTU training data as mentioned in MVSNet and CasMVSNet. Download DTU training data and Depth raw. Unzip and organize them as:

dtu_training                     
    ├── Cameras                
    ├── Depths   
    ├── Depths_raw
    └── Rectified

Testing Data. Download DTU testing data. Unzip it as:

dtu_testing                                       
    ├── scan1   
    ├── scan4
    ├── ...

2. BlendedMVS Dataset

Download BlendedMVS and unzip it as:

blendedmvs                          
    ├── 5a0271884e62597cdee0d0eb                
    ├── 5a3ca9cb270f0e3f14d0eddb   
    ├── ...
    ├── training_list.txt
    ├── ...

3. Tanks and Temples Dataset

Download Tanks and Temples and unzip it as:

tanksandtemples                          
       ├── advanced                 
       │   ├── Auditorium       
       │   ├── ...  
       └── intermediate
           ├── Family       
           ├── ...          

We use the camera parameters of short depth range version (included in your download), you should replace the cams folder in intermediate folder with the short depth range version manually.

Or you can download our processed data here.

We recommend the latter.

Training

Training on DTU

To train the model from scratch on DTU, specify DTU_TRAINING in ./scripts/train_dtu.sh first and then run:

bash scripts/train_dtu.sh exp_name

After training, you will get model checkpoints in ./checkpoints/dtu/exp_name.

Finetune on BlendedMVS

To fine-tune the model on BlendedMVS, you need specify BLD_TRAINING and BLD_CKPT_FILE in ./scripts/train_bld.sh first, then run:

bash scripts/train_bld.sh exp_name

Testing

Testing on DTU

For DTU testing, we use the model (pretrained model) trained on DTU training dataset. Specify DTU_TESTPATH and DTU_CKPT_FILE in ./scripts/test_dtu.sh first, then run the following command to generate point cloud results.

bash scripts/test_dtu.sh exp_name

For quantitative evaluation, download SampleSet and Points from DTU's website. Unzip them and place Points folder in SampleSet/MVS Data/. The structure is just like:

SampleSet
├──MVS Data
      └──Points

Specify datapath, plyPath, resultsPath in evaluations/dtu/BaseEvalMain_web.m and datapath, resultsPath in evaluations/dtu/ComputeStat_web.m, then run the following command to obtain the quantitative metics.

cd evaluations/dtu
matlab -nodisplay
BaseEvalMain_web 
ComputeStat_web

Testing on Tanks and Temples

We recommend using the finetuned model (pretrained model) to test on Tanks and Temples benchmark. Similarly, specify TNT_TESTPATH and TNT_CKPT_FILE in scripts/test_tnt_inter.sh and scripts/test_tnt_adv.sh. To generate point cloud results, just run:

bash scripts/test_tnt_inter.sh exp_name
bash scripts/test_tnt_adv.sh exp_name

For quantitative evaluation, you can upload your point clouds to Tanks and Temples benchmark.

Citation

@InProceedings{Liu_2023_ICCV,
    author    = {Liu, Tianqi and Ye, Xinyi and Zhao, Weiyue and Pan, Zhiyu and Shi, Min and Cao, Zhiguo},
    title     = {When Epipolar Constraint Meets Non-Local Operators in Multi-View Stereo},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023},
    pages     = {18088-18097}

Acknowledgements

Our work is partially based on these opening source work: MVSNet, cascade-stereo, MVSTER.

We appreciate their contributions to the MVS community.