NeurIPS 2024
Junyoung Seo*,1,3 Kazumi Fukuda1 Takashi Shibuya1 Takuya Narihira1
Naoki Murata1 Shoukang Hu1 Chieh-Hsin Lai1
Seungryong Kim†,3 Yuki Mitsufuji†,1,2
1Sony AI 2Sony Group Corporation 3KAIST
*Work done during an internship at Sony AI. †Co-corresponding authors.
Introduction | Demo | Examples | How to use | Citation | Acknowledgements
- 26/09/2024: Our paper is accepted for NeurIPS 2024
- 13/09/2024: Added example with Depth Anything V2
- 27/08/2024: Codes and demos are released
This repository is an official implementation for the paper "GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping". Genwarp can generate novel view images from a single input conditioned on camera poses. In this repository, we offer the codes for inference of the model. For detailed information, please refer to the paper.
Here is a quick preview of GenWarp in action. Try it out by yourself at Spaces or run it locally on your machine. See How to use section for more details. (Left) 3D scene reconstructed from the input image and the estimated depth. (Middle) Warped image. (Right) Generated image.
demo_video.mp4
Our model can handle images from various domains including indoor/outdoor scenes, and even illustrations with challenging camera viewpoint changes.
You can find examples on our project page and on our paper. Or even better, you can try your favourite images on the live demo at Spaces.
Generated novel views can be used for 3D reconstruction. In the example below, we reconstructed a 3D scene via InstantSplat. We generated the video using this implementation.
genwarp_instantsplat.mp4
We tested our codes on Ubuntu 20.04 with nVidia A100 GPU. If you're using other machines like Windows, consider using Docker. You can either add packages to your python environment or use Docker to build an python environment. Commands below are all expected to run in the root directory of the repository.
Note
You may want to change username and uid variables written in DockerFile. Please check DockerFile before running the commands below.
docker build . -t genwarp:latest
docker run --gpus=all -it -v $(pwd):/workspace/genwarp -w /workspace/genwarp genwarp
Inside the docker container, you can install packages as below.
We tested the environment with python >=3.10
and CUDA =11.8
. To add mandatory dependencies run the command below.
pip install -r requirements.txt
To run developmental codes such as the example provided in jupyter notebook and the live demo implemented by gradio, add extra dependencies via the command below.
pip install -r requirements_dev.txt
GenWarp uses pretrained models which consist of both our finetuned models and publicly available third-party ones. Download all the models to checkpoints
directory or anywhere of your choice. You can do it manually or by the download_models.sh script.
./scripts/download_models.sh ./checkpoints
Note
Models and checkpoints provided below may be distributed under different licenses. Users are required to check licenses carefully on their behalf.
- Our finetuned models:
- For details about each model, check out the model card.
- multi-dataset model 1
- download all files into
checkpoints/multi1
- download all files into
- multi-dataset model 2
- download all files into
checkpoints/multi2
- download all files into
- Pretrained models:
- sd-vae-ft-mse
- download
config.json
anddiffusion_pytorch_model.safetensors
tocheckpoints/sd-vae-ft-mse
- download
- sd-image-variations-diffusers
- download
image_encoder/config.json
andimage_encoder/pytorch_model.bin
tocheckpoints/image_encoder
- download
- sd-vae-ft-mse
The final checkpoints
directory must look like this:
genwarp
└── checkpoints
├── image_encoder
│ ├── config.json
│ └── pytorch_model.bin
├── multi1
│ ├── config.json
│ ├── denoising_unet.pth
│ ├── pose_guider.pth
│ └── reference_unet.pth
├── multi2
│ ├── config.json
│ ├── denoising_unet.pth
│ ├── pose_guider.pth
│ └── reference_unet.pth
└── sd-vae-ft-mse
├── config.json
└── diffusion_pytorch_model.safetensors
The model requires depth maps to generate novel views although such a model is not included this repository. To this end, users can install one of Monocular Depth Estimation (MDE) models publicly available.
ZoeDepth
We used and therefore recommend ZoeDepth.
git clone https://github.com/isl-org/ZoeDepth.git extern/ZoeDepth
Tip
To use ZoeDepth, please install requirements_dev.txt
for additional packages.
Depth Anything V2
More recent models are also available. Depth Anything V2 is one of the SOTA models of depth estimation. You can use the metric depth version.
git clone https://github.com/DepthAnything/Depth-Anything-V2.git extern/Depth-Anything-V2
Download the models from their repository. And see the example notebook for the usage with GenWarp. Notice that they have separate models for indoor and outdoor scenes respectively.
Initialisation
Import GenWarp class and instantiate it with a config. Set the path to the checkpoints directory to pretrained_model_path
and select the model version in checkpoint_name
. For more options, check out GenWarp.py
from genwarp import GenWarp
genwarp_cfg = dict(
pretrained_model_path='./checkpoints',
checkpoint_name='multi1',
half_precision_weights=True
)
genwarp_nvs = GenWarp(cfg=genwarp_full_cfg)
# Load MDE model.
depth_estimator = torch.hub.load(
'./extern/ZoeDepth',
'ZoeD_N',
source='local',
pretrained=True,
trust_repo=True
).to('cuda')
Prepare inputs
Load the input image and estimate the corresponding depth map. Create camera matrices for the intrinsic and extrinsic parameters. ops.py has helper functions to create matrices.
from PIL import Image
from torchvision.transforms.functional import to_tensor
src_image = to_tensor(Image.open(image_file).convert('RGB'))[None].cuda()
src_depth = depth_estimator.infer(src_image)
import torch
from genwarp.ops import camera_lookat, get_projection_matrix
proj_mtx = get_projection_matrix(
fovy=fovy,
aspect_wh=1.,
near=near,
far=far
)
src_view_mtx = camera_lookat(
torch.tensor([[0., 0., 0.]]), # From (0, 0, 0)
torch.tensor([[-1., 0., 0.]]), # Cast rays to -x
torch.tensor([[0., 0., 1.]]) # z-up
)
tar_view_mtx = camera_lookat(
torch.tensor([[-0.1, 2., 1.]]), # Camera eye position
torch.tensor([[-5., 0., 0.]]), # Looking at.
z_up # z-up
)
rel_view_mtx = (
tar_view_mtx @ torch.linalg.inv(src_view_mtx.float())
).to(src_image)
Warping
Call the main function of GenWarp. And check the result.
renders = genwarp_nvs(
src_image=src_image,
src_depth=src_depth,
rel_view_mtx=rel_view_mtx,
src_proj_mtx=proj_mtx,
tar_proj_mtx=proj_mtx
)
# Outputs.
renders['synthesized'] # Generated image.
renders['warped'] # Depth based warping image (for comparison).
renders['mask'] # Mask image (mask=1 where visible pixels).
renders['correspondence'] # Correspondence map.
We provide a complete example in genwarp_inference.ipynb
To access a Jupyter Notebook running in a docker container, you may need to use the host's network. For further details, please refer to the manual of Docker.
docker run --gpus=all -it --net host -v $(pwd):/workspace/genwarp -w /workspace/genwarp genwarp
Install requirements_dev.txt
for additional packages to run the Jupyter Notebook.
An interactive live demo is also available. Start gradio demo by running the command below, and goto http://127.0.0.1:7860/ If you are running it on the server, be sure to forward the port 7860.
Or you can just visit Spaces hosted by Hugging Face to try it now.
python app.py
@article{seo2024genwarp,
title={GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping},
author={Junyoung Seo and Kazumi Fukuda and Takashi Shibuya and Takuya Narihira and Naoki Murata and Shoukang Hu and Chieh-Hsin Lai and Seungryong Kim and Yuki Mitsufuji},
year={2024},
journal={arXiv preprint arXiv:2405.17251},
}
Our codes are based on Moore-AnimateAnyone and other repositories it is based on. We thank the authors of relevant repositories and papers.