[ICCV2023 Oral] MixVoxels: Mixed Neural Voxels for Fast Multi-view Video Synthesis

Project Page | Paper

Pytorch implementation for the paper: Mixed Neural Voxels for Fast Multi-view Video Synthesis.

cat.mp4

More complicated scenes with fast movements and large areas of motions.

output.mp4

We present MixVoxels to better represent the dynamic scenes with fast training speed and competitive rendering qualities. The proposed MixVoxels represents the 4D dynamic scenes as a mixture of static and dynamic voxels and processes them with different networks. In this way, the computation of the required modalities for static voxels can be processed by a lightweight model, which essentially reduces the amount of computation, especially for many daily dynamic scenes dominated by the static background. As a result, with 15 minutes of training for dynamic scenes with inputs of 300-frame videos, MixVoxels achieves better PSNR than previous methods.

Installation

Install environment:

conda create -n mixvoxels python=3.8
conda activate mixvoxels
pip install torch torchvision
pip install tqdm scikit-image opencv-python configargparse lpips imageio-ffmpeg kornia lpips tensorboard pyfvvdp

Dataset

Download the Plenoptic Video Dataset
Unzip to your directory DATADIR and run the following command:

python tools/prepare_video.py ${DATADIR}

Training

To train a dynamic scene, run the following commands, you can train different dynamic scenes by assign DATA to different scene name:

DATA=coffee_martini # [coffee_martini|cut_roasted_beef|cook_spinach|flame_salmon|flame_steak|sear_steak]
# MixVoxels-T
python train.py --config configs/schedule5000/${DATA}_5000.txt --render_path 0
# MixVoxels-S
python train.py --config configs/schedule7500/${DATA}_7500.txt --render_path 0
# MixVoxels-M
python train.py --config configs/schedule12500/${DATA}_12500.txt --render_path 0
# MixVoxels-L
python train.py --config configs/schedule25000/${DATA}_25000.txt --render_path 0

Please note that in your first running, the above command will first pre-process the dataset, including resizing the frames by a factor of 2 (to 1K resolution which is a standard), as well as calculating the std of each video and save them into your disk. The pre-processing will cost about 2 hours, but is only required at the first running. After the pre-processing, the command will automatically train your scenes.

We provide the trained model:

scene	PSNR		download
scene	MixVoxels-T (15min)	MixVoxels-M (40min)	MixVoxels-T (15min)	MixVoxels-M (40min)
coffee-martini	28.1339	29.0186	link	link
flame-salmon	28.7982	29.2620	link	link
cook-spinach	31.4499	31.6433	link	link
cut-roasted-beef	32.4078	32.2800	link	link
flame-steak	31.6508	31.3052	link	link
sear-steak	31.8203	31.2136	link	link

Rendering and Generating Spirals

The following command will generate 120 novel view videos, or you can set the render_path as 1 in the above training command.

python train.py --config your_config --render_only 1 --render_path 1 --ckpt log/your_config/your_config.ckpt

Generating spirals:

python tools/make_spiral.py --videos_path log/your_config/imgs_path_all/ --target log/your_config/spirals --target_video log/your_config/spirals.mp4

Citation

If you find our code or paper helps, please consider citing:

@article{wang2022mixed,
  title={Mixed Neural Voxels for Fast Multi-view Video Synthesis},
  author={Wang, Feng and Tan, Sinan and Li, Xinghang and Tian, Zeyue and Liu, Huaping},
  journal={arXiv preprint arXiv:2212.00190},
  year={2022}
}

Acknowledge

The codes are based on TensoRF, many thanks to the authors.

kaname-madoka18/mixvoxels