/mixvoxels

ICCV2023 (Oral) MixVoxels https://arxiv.org/abs/2212.00190

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

[ICCV2023 Oral] MixVoxels: Mixed Neural Voxels for Fast Multi-view Video Synthesis

Pytorch implementation for the paper: Mixed Neural Voxels for Fast Multi-view Video Synthesis.

cat.mp4

More complicated scenes with fast movements and large areas of motions.

output.mp4

We present MixVoxels to better represent the dynamic scenes with fast training speed and competitive rendering qualities. The proposed MixVoxels represents the 4D dynamic scenes as a mixture of static and dynamic voxels and processes them with different networks. In this way, the computation of the required modalities for static voxels can be processed by a lightweight model, which essentially reduces the amount of computation, especially for many daily dynamic scenes dominated by the static background. As a result, with 15 minutes of training for dynamic scenes with inputs of 300-frame videos, MixVoxels achieves better PSNR than previous methods.

Installation

Install environment:

conda create -n mixvoxels python=3.8
conda activate mixvoxels
pip install torch torchvision
pip install tqdm scikit-image opencv-python configargparse lpips imageio-ffmpeg kornia lpips tensorboard pyfvvdp

Dataset

  1. Download the Plenoptic Video Dataset
  2. Unzip to your directory DATADIR and run the following command:
python tools/prepare_video.py ${DATADIR}

Training

To train a dynamic scene, run the following commands, you can train different dynamic scenes by assign DATA to different scene name:

DATA=coffee_martini # [coffee_martini|cut_roasted_beef|cook_spinach|flame_salmon|flame_steak|sear_steak]
# MixVoxels-T
python train.py --config configs/schedule5000/${DATA}_5000.txt --render_path 0
# MixVoxels-S
python train.py --config configs/schedule7500/${DATA}_7500.txt --render_path 0
# MixVoxels-M
python train.py --config configs/schedule12500/${DATA}_12500.txt --render_path 0
# MixVoxels-L
python train.py --config configs/schedule25000/${DATA}_25000.txt --render_path 0

Please note that in your first running, the above command will first pre-process the dataset, including resizing the frames by a factor of 2 (to 1K resolution which is a standard), as well as calculating the std of each video and save them into your disk. The pre-processing will cost about 2 hours, but is only required at the first running. After the pre-processing, the command will automatically train your scenes.

We provide the trained model:

scene PSNR download
MixVoxels-T (15min) MixVoxels-M (40min) MixVoxels-T (15min) MixVoxels-M (40min)
coffee-martini 28.1339 29.0186 link link
flame-salmon 28.7982 29.2620 link link
cook-spinach 31.4499 31.6433 link link
cut-roasted-beef 32.4078 32.2800 link link
flame-steak 31.6508 31.3052 link link
sear-steak 31.8203 31.2136 link link

Rendering and Generating Spirals

The following command will generate 120 novel view videos, or you can set the render_path as 1 in the above training command.

python train.py --config your_config --render_only 1 --render_path 1 --ckpt log/your_config/your_config.ckpt

Generating spirals:

python tools/make_spiral.py --videos_path log/your_config/imgs_path_all/ --target log/your_config/spirals --target_video log/your_config/spirals.mp4

Citation

If you find our code or paper helps, please consider citing:

@article{wang2022mixed,
  title={Mixed Neural Voxels for Fast Multi-view Video Synthesis},
  author={Wang, Feng and Tan, Sinan and Li, Xinghang and Tian, Zeyue and Liu, Huaping},
  journal={arXiv preprint arXiv:2212.00190},
  year={2022}
}

Acknowledge

The codes are based on TensoRF, many thanks to the authors.