/SparseNeRF

[ICCV 2023] SparseNeRF: Distilling Depth Ranking for Few-shot Novel View Synthesis

Primary LanguagePythonOtherNOASSERTION

SparseNeRF: Distilling Depth Ranking for Few-shot Novel View Synthesis

Guangcong WangZhaoxi ChenChen Change LoyZiwei Liu
S-Lab, Nanyang Technological University
ICCV 2023

visitors

🐤 Update:

🐤 Features:

🐤 TL;DR: We present SparseNeRF, a simple yet effective method that synthesizes novel views given a few images. SparseNeRF distills robust local depth ranking priors from real-world inaccurate depth observations, such as pre-trained monocular depth estimation models or consumer-level depth sensors.

🐤 Abstract: Neural Radiance Field (NeRF) significantly degrades when only a limited number of views are available. To complement the lack of 3D information, depth-based models, such as DSNeRF and MonoSDF, explicitly assume the availability of accurate depth maps of multiple views. They linearly scale the accurate depth maps as supervision to guide the predicted depth of few-shot NeRFs. However, accurate depth maps are difficult and expensive to capture due to wide-range depth distances in the wild.

In this work, we present a new Sparse-view NeRF (SparseNeRF) framework that exploits depth priors from real-world inaccurate observations. The inaccurate depth observations are either from pre-trained depth models or coarse depth maps of consumer-level depth sensors. Since coarse depth maps are not strictly scaled to the ground-truth depth maps, we propose a simple yet effective constraint, a local depth ranking method, on NeRFs such that the expected depth ranking of the NeRF is consistent with that of the coarse depth maps in local patches. To preserve the spatial continuity of the estimated depth of NeRF, we further propose a spatial continuity constraint to encourage the consistency of the expected depth continuity of NeRF with coarse depth maps. Surprisingly, with simple depth ranking constraints, SparseNeRF outperforms all state-of-the-art few-shot NeRF methods (including depth-based models) on standard LLFF and DTU datasets. Moreover, we collect a new dataset NVS-RGBD that contains real-world depth maps from Azure Kinect, ZED 2, and iPhone 13 Pro. Extensive experiments on NVS-RGBD dataset also validate the superiority and generalizability of SparseNeRF.

🐤 Framework Overview: SparseNeRF consists of two streams, i.e., NeRF and depth prior distillation. As for NeRF, we use Mip-NeRF as the backbone. we use a NeRF reconstruction loss. As for depth prior distillation, we distill depth priors from a pre-trained depth model. Specifically, we propose a local depth ranking regularization and a spatial continuity regularization to distill robust depth priors from coarse depth maps.

1. Prerequisites

  • Linux or macOS
  • Python 3.6.13
  • NVIDIA GPU + CUDA cuDNN(10.1)
  • OpenCV

2. Installation

We recommend using the virtual environment (conda) to run the code easily.

conda create -n sparsenerf python=3.6.13
conda activate sparsenerf
pip install -r requirements.txt

Download jax+cuda (jaxlib-0.1.68+cuda101-cp36) wheels from this link by

wget https://storage.googleapis.com/jax-releases/cuda101/jaxlib-0.1.68+cuda101-cp36-none-manylinux2010_x86_64.whl
pip install jaxlib-0.1.68+cuda101-cp36-none-manylinux2010_x86_64.whl
rm jaxlib-0.1.68+cuda101-cp36-none-manylinux2010_x86_64.whl

Install pytorch and related packages for pretrained depth models

conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=10.1 -c pytorch
pip install timm
pip install opencv-python

Install ffmpeg for composing videos

pip install imageio-ffmpeg

3. Dataset

3.1 Download DTU dataset

  • Download the DTU dataset from the official website, "Rectified (123 GB)" and "SampleSet (6.3 GB)"
  • Data: extract "Rectified (123 GB)"
  • Poses: extract "SampleSet/MVS\ Data/Calibration/cal18/" from "SampleSet (6.3 GB)"
  • Masks: download masks (used for evaluation only) from this link
  • Get depth maps, For both LLFF and DTU, please set the variables $root_path, $benchmark, and $dataset_id in get_depth_map.sh, and run:
sh scripts/get_depth_map_for_dtu.sh

3.2 Download LLFF dataset

  • Download LLFF from the official download link.
  • Get depth maps, For both LLFF and DTU, please set the variables $root_path, $benchmark, and $dataset_id in get_depth_map.sh, and run:
sh scripts/get_depth_map_for_llff.sh

3.3 Download NVS-RGBD dataset

4. Training

4.1 Training on LLFF

Please set the variables in scripts/train_llff.sh and configs/llff3.gin, and run:

sh scripts/train_llff.sh

4.2 Training on DTU

Please set the variables in train_dtu3.sh, and run:

sh scripts/train_dtu.sh

4.3 Training on NVS-RGBD

Similar to 4.1 and 4.2. The depth maps are from depth sensors.

sh scripts/train_kinect.sh
sh scripts/train_zed.sh
sh scripts/train_iphone.sh

5. Test

5.1 Evaluation on LLFF

Please set the variables (the same as train_llff3.sh and train_dtu3.sh) in eval_llff3.sh or eval_dtu3, and run:

sh scripts/eval_llff.sh

5.2 Evaluation on DTU

sh scripts/eval_dtu.sh

5.3 Evaluation on NVS-RGBD

sh scripts/eval_kinect.sh
sh scripts/eval_zed.sh
sh scripts/eval_iphone.sh

6 (Optional) Render videos

Please set the variables (the same as train_llff.sh and train_dtu.sh) in render_llff.sh or render_dtu.sh, and run.

6.1 Render videos on LLFF

sh scripts/render_llff.sh

6.2 Render videos on DTU

sh scripts/render_dtu.sh

6.3 Render videos on NVS-RGBD

sh scripts/render_kinect.sh
sh scripts/render_zed.sh
sh scripts/render_iphone.sh

7 (Optional) Compose videos

Please set the variables in generate_video_llff.sh or other scripts, and run.

7.1 Compose videos on LLFF

sh generate_video_llff.sh

7.2 Compose videos on DTU

sh generate_video_dtu.sh

7.3 Compose videos on NVS-RGBD

sh generate_video_kinect.sh
sh generate_video_zed.sh
sh generate_video_iphone.sh

8 (Optional) Tensorboard for visualizing training if necessary.

tensorboard --logdir=./out/xxx/ --port=6006

If it raises errors, see Q2 of FQA

9. Citation

If you find this useful for your research, please cite the our paper.

@inproceedings{wang2022sparsenerf,
   author    = {Wang, Guangcong and Chen, Zhaoxi and Loy, Chen Change and Liu, Ziwei},
   title     = {SparseNeRF: Distilling Depth Ranking for Few-shot Novel View Synthesis},
   booktitle = {IEEE/CVF International Conference
on Computer Vision (ICCV)},   
   year      = {2023},
  }

or

Guangcong Wang, Zhaoxi Chen, Chen Change Loy, and Ziwei Liu. SparseNeRF: Distilling Depth Ranking for Few-shot Novel View Synthesis, IEEE/CVF International Conference on Computer Vision (ICCV) 2023.

10. Related Links

RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from Sparse Inputs, CVPR, 2022

ViTA: Video Transformer Adaptor for Robust Video Depth Estimation

Traditional Classification Neural Networks are Good Generators: They are Competitive with DDPMs and GANs

SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections

StyleLight: HDR Panorama Generation for Lighting Estimation and Editing, ECCV 2022.

Text2Light: Zero-Shot Text-Driven HDR Panorama Generation

Relighting4D: Neural Relightable Human from Videos, ECCV 2022