Paper | Arxiv | Project Page
This repository contains the official implementation of single-image novel view synthesis (NVS) from the project MegaScenes: Scene-Level View Synthesis at Scale. Details on the dataset can be found here.
If you find our code or paper useful, please consider citing
@misc{
tung2024megascenes,
title={MegaScenes: Scene-Level View Synthesis at Scale},
author={Tung, Joseph and Chou, Gene and Cai, Ruojin and Yang, Guandao and Zhang, Kai and Wetzstein, Gordon and Hariharan, Bharath and Snavely, Noah},
year={2024},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
We recommend creating a conda environment then installing the required packages using the following commands:
conda create -n megascenes python=3.8 pip --yes
conda activate megascenes
bash setup_env.sh
We provide two checkpoints in the MegaScenes AWS bucket. Download the folder s3://megascenes/nvs_checkpoints/warp_plus_pose/iter_112000/
to the directory configs/warp_plus_pose/iter_112000/
. This model is conditioned on warped images and poses as described in the paper. Download the folder s3://megascenes/nvs_checkpoints/zeronvs_finetune/iter_90000/
to the directory configs/zeronvs_finetune/iter_90000/
. This checkpoint is ZeroNVS finetuned on MegaScenes. For comparison, also download the original ZeroNVS checkpoint to the directory configs/zeronvs_original/iter_0/zeronvs.ckpt
.
The following commands create videos based on two pre-defined camera paths. -i
points to the path of the reference image and -s
is the output path.
The generated .gif
files will be located at qual_eval/warp_plus_pose/audley/orbit/videos/best.gif
and .../spiral/videos/best.gif
. The warped images at each camera location will be located at qual_eval/warp_plus_pose/audley/orbit/warped/warps.gif
and .../spiral/warped/warps.gif
.
Adjust the batch size as needed.
python video_script.py -e configs/warp_plus_pose/ -r 112000 -i data/examples/audley_end_house.jpg -s qual_eval/warp_plus_pose/audley
python video_script.py -e configs/zeronvs_finetune/ -r 90000 -i data/examples/audley_end_house.jpg -s qual_eval/zeronvs_finetune/audley -z
python video_script.py -e configs/zeronvs_original/ -r 0 -i data/examples/audley_end_house.jpg -s qual_eval/zeronvs_original/audley -z --ckpt_file
The MegaScenes dataset is hosted on AWS. Documentation can be found here. Training NVS requires image pairs and their camera parameters and warpings. We provide the filtered image pairs and camera parameters in s3://megascenes/nvs_checkpoints/splits/
. Download the folder to data/splits/
.
Each .pkl
file is a list of lists with the format
[img 1, img2, {img 1 extrinsics, img 1 intrinsics}, {img 2 extrinsics, img 2 intrinsics}, scale (of img 1's translation vector based on 20th quantile of depth)]
.
See dataloader/paired_dataset.py
for details.
We recommend preprocessing warped images. We provide code to warp a reference image to a target pose given its depth map and camera parameters.
from dataloader.util_3dphoto import unproject_depth, render_view
mesh = unproject_depth('mesh_path.ply', img, depthmap, intrinsics, c2w_original_pose, scale_factor=1.0, add_faces=True, prune_edge_faces=True)
warped_image, _ = render_view(h, w, intrinsics, c2w_target_pose, mesh)
We currently do not provide the aligned depth maps and warped images.
accelerate launch --config_file acc_configs/{number_of_gpus}.yaml train.py -e configs/warp_and_pose/ -b {batch_size} -w {workers}
We use a batch size of 88 on an A6000 with 49G of vram.
python test.py -e configs/warp_plus_pose -r 112000 -s warp_plus_pose_evaluation -b {batch_size} -w {workers} --save_generations True --save_data
python test.py -e configs/zeronvs_finetune -r 90000 -s zeronvs_evaluation -b {batch_size} -w {workers} --save_generations True
Generated images and metrics are saved to quant_eval/warp_plus_pose_evaluation
. -r
loads the saved checkpoint. The warped images also should be prepared in advance for calculating metrics.
We adapt code from
Zero-1-to-3 https://zero123.cs.columbia.edu/
ZeroNVS https://kylesargent.github.io/zeronvs/