This repository contains a pytorch lightning implementation for the paper: MVSNeRF: Fast Generalizable Radiance Field Reconstruction from Multi-View Stereo. Our work present a novel neural rendering approach that can efficiently reconstruct
geometric and neural radiance fields for view synthesis, Moreover, if dense images are captured, our estimated radiance field representation can be easily fine-tuned; this leads to fast per-scene reconstruction.
Install environment:
pip install pytorch-lightning, inplace_abn
pip install imageio, pillow, scikit-image, opencv-python, config-argparse, lpips
Please see each subsection for training on different datasets. Available training datasets:
- DTU
- Blender (Realistic Synthetic)
- LLFF (Real Forward-Facing)
- Your own data (images/intrinsic/extrinsic/nearfar boundles)
Download the preprocessed DTU training data and Depth_raw from original MVSNet repo and unzip. We provide a DTU example, please follow with the example's folder structure.
Run
CUDA_VISIBLE_DEVICES=$cuda python train_mvs_nerf_pl.py \
--expname $exp_name
--num_epochs 6
--use_viewdirs \
--dataset_name dtu \
--datadir $DTU_DIR
More options refer to the opt.py
, training command example:
CUDA_VISIBLE_DEVICES=0 python train_mvs_nerf_pl.py
--with_depth --imgScale_test 1.0 \
--expname mvs-nerf-is-all-your-need \
--num_epochs 6 --N_samples 128 --use_viewdirs --batch_size 1024 \
--dataset_name dtu \
--datadir path/to/dtu/data \
--N_vis 6
You may need to add --with_depth
if you want to quantity depth during training. --N_vis
denotes the validation frequency.
--imgScale_test
is the downsample ratio during validation, like 0.5. The training process takes about 30h on single RTX 2080Ti
for 6 epochs.
Important: please always set batch_size to 1 when you are trining a genelize model, you can enlarge it when fine-tuning.
Checkpoint: a pre-trained checkpint is included in ckpts/mvsnerf-v0.tar
.
Evaluation: We also provide a rendering and quantity scipt in renderer.ipynb
,
and you can also use the run_batch.py if you want to testing or finetuning on different dataset.
Rendering from the trained model should have result like this:
Steps
Download nerf_synthetic.zip
from here
CUDA_VISIBLE_DEVICES=0 python train_mvs_nerf_finetuning_pl.py \
--dataset_name blender --datadir /path/to/nerf_synthetic/lego \
--expname lego-ft --with_rgb_loss --batch_size 1024 \
--num_epochs 1 --imgScale_test 1.0 --white_bkgd --pad 0 \
--ckpt ./ckpts/mvsnerf-v0.tar --N_vis 1
Steps
Download nerf_llff_data.zip
from here
CUDA_VISIBLE_DEVICES=0 python train_mvs_nerf_finetuning_pl.py \
--dataset_name llff --datadir /path/to/nerf_llff_data/{scene_name} \
--expname horns-ft --with_rgb_loss --batch_size 1024 \
--num_epochs 1 --imgScale_test 1.0 --pad 24 \
--ckpt ./ckpts/mvsnerf-v0.tar --N_vis 1
Steps
CUDA_VISIBLE_DEVICES=0 python train_mvs_nerf_finetuning_pl.py \
--dataset_name dtu_ft --datadir /path/to/DTU/mvs_training/dtu/scan1 \
--expname scan1-ft --with_rgb_loss --batch_size 1024 \
--num_epochs 1 --imgScale_test 1.0 --pad 24 \
--ckpt ./ckpts/mvsnerf-v0.tar --N_vis 1
After training or finetuning, you can render free-viewpoint videos
with the renderer-video.ipynb
. if you want to use your own data,
please using the right hand coordinate system (intrinsic, nearfar and extrinsic either with
camera to world or world to camera in opencv format) and modify the rendering scipts.
After 10k iterations (~ 15min), you should have videos like this:
If you find our code or paper helps, please consider citing:
@article{chen2021mvsnerf,
title={MVSNeRF: Fast Generalizable Radiance Field Reconstruction from Multi-View Stereo},
author={Chen, Anpei and Xu, Zexiang and Zhao, Fuqiang and Zhang, Xiaoshuai and Xiang, Fanbo and Yu, Jingyi and Su, Hao},
journal={arXiv preprint arXiv:2103.15595},
year={2021}
}
}
Big thanks to CasMVSNet_pl, our code is partially borrowing from them.
MVSNet: Depth Inference for Unstructured Multi-view Stereo (ECCV 2018)
Yao Yao, Zixin Luo, Shiwei Li, Tian Fang, Long Quan
Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching (CVPR 2020)
Xiaodong Gu, Zhiwen Fan, Zuozhuo Dai, Siyu Zhu, Feitong Tan, Ping Tan
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis (ECCV 2020)
Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, Ren Ng
IBRNet: Learning Multi-View Image-Based Rendering (CVPR 2021)
Qianqian Wang, Zhicheng Wang, Kyle Genova, Pratul Srinivasan, Howard Zhou, Jonathan T. Barron, Ricardo Martin-Brualla, Noah Snavely, Thomas Funkhouser
PixelNeRF: Neural Radiance Fields from One or Few Images (CVPR 2021)
Alex Yu, Vickie Ye, Matthew Tancik, Angjoo Kanazawa