/genwarp

Primary LanguagePythonMIT LicenseMIT

GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping

NeurIPS 2024

Junyoung Seo*,1,3 Kazumi Fukuda1 Takashi Shibuya1 Takuya Narihira1 Naoki Murata1 Shoukang Hu1 Chieh-Hsin Lai1 Seungryong Kim†,3 Yuki Mitsufuji†,1,2
1Sony AI 2Sony Group Corporation 3KAIST
*Work done during an internship at Sony AI. Co-corresponding authors.

Project Site   Spaces   Github   Models   arXiv

Introduction | Demo | Examples | How to use | Citation | Acknowledgements

concept image

Updates

  • 26/09/2024: Our paper is accepted for NeurIPS 2024
  • 13/09/2024: Added example with Depth Anything V2
  • 27/08/2024: Codes and demos are released

Introduction

This repository is an official implementation for the paper "GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping". Genwarp can generate novel view images from a single input conditioned on camera poses. In this repository, we offer the codes for inference of the model. For detailed information, please refer to the paper.

Framework

Demo

Here is a quick preview of GenWarp in action. Try it out by yourself at Spaces or run it locally on your machine. See How to use section for more details. (Left) 3D scene reconstructed from the input image and the estimated depth. (Middle) Warped image. (Right) Generated image.

demo_video.mp4

Examples

Our model can handle images from various domains including indoor/outdoor scenes, and even illustrations with challenging camera viewpoint changes.

You can find examples on our project page and on our paper. Or even better, you can try your favourite images on the live demo at Spaces.

Examples

Generated novel views can be used for 3D reconstruction. In the example below, we reconstructed a 3D scene via InstantSplat. We generated the video using this implementation.

genwarp_instantsplat.mp4

How to use

Environment

We tested our codes on Ubuntu 20.04 with nVidia A100 GPU. If you're using other machines like Windows, consider using Docker. You can either add packages to your python environment or use Docker to build an python environment. Commands below are all expected to run in the root directory of the repository.

Use Docker to build an environment

Note

You may want to change username and uid variables written in DockerFile. Please check DockerFile before running the commands below.

docker build . -t genwarp:latest
docker run --gpus=all -it -v $(pwd):/workspace/genwarp -w /workspace/genwarp genwarp

Inside the docker container, you can install packages as below.

Add dependencies to your python environment

We tested the environment with python >=3.10 and CUDA =11.8. To add mandatory dependencies run the command below.

pip install -r requirements.txt

To run developmental codes such as the example provided in jupyter notebook and the live demo implemented by gradio, add extra dependencies via the command below.

pip install -r requirements_dev.txt

Download pretrained models

GenWarp uses pretrained models which consist of both our finetuned models and publicly available third-party ones. Download all the models to checkpoints directory or anywhere of your choice. You can do it manually or by the download_models.sh script.

Download script

./scripts/download_models.sh ./checkpoints

Manual download

Note

Models and checkpoints provided below may be distributed under different licenses. Users are required to check licenses carefully on their behalf.

  1. Our finetuned models:
  2. Pretrained models:
    • sd-vae-ft-mse
      • download config.json and diffusion_pytorch_model.safetensors to checkpoints/sd-vae-ft-mse
    • sd-image-variations-diffusers
      • download image_encoder/config.json and image_encoder/pytorch_model.bin to checkpoints/image_encoder

The final checkpoints directory must look like this:

genwarp
└── checkpoints
    ├── image_encoder
    │   ├── config.json
    │   └── pytorch_model.bin
    ├── multi1
    │   ├── config.json
    │   ├── denoising_unet.pth
    │   ├── pose_guider.pth
    │   └── reference_unet.pth
    ├── multi2
    │   ├── config.json
    │   ├── denoising_unet.pth
    │   ├── pose_guider.pth
    │   └── reference_unet.pth
    └── sd-vae-ft-mse
        ├── config.json
        └── diffusion_pytorch_model.safetensors

Inference

Install MDE module

The model requires depth maps to generate novel views although such a model is not included this repository. To this end, users can install one of Monocular Depth Estimation (MDE) models publicly available.

ZoeDepth

We used and therefore recommend ZoeDepth.

git clone https://github.com/isl-org/ZoeDepth.git extern/ZoeDepth

Tip

To use ZoeDepth, please install requirements_dev.txt for additional packages.

Depth Anything V2

More recent models are also available. Depth Anything V2 is one of the SOTA models of depth estimation. You can use the metric depth version.

git clone https://github.com/DepthAnything/Depth-Anything-V2.git extern/Depth-Anything-V2

Download the models from their repository. And see the example notebook for the usage with GenWarp. Notice that they have separate models for indoor and outdoor scenes respectively.

API

Initialisation

Import GenWarp class and instantiate it with a config. Set the path to the checkpoints directory to pretrained_model_path and select the model version in checkpoint_name. For more options, check out GenWarp.py

from genwarp import GenWarp

genwarp_cfg = dict(
    pretrained_model_path='./checkpoints',
    checkpoint_name='multi1',
    half_precision_weights=True
)
genwarp_nvs = GenWarp(cfg=genwarp_full_cfg)

# Load MDE model.
depth_estimator = torch.hub.load(
    './extern/ZoeDepth',
    'ZoeD_N',
    source='local',
    pretrained=True,
    trust_repo=True
).to('cuda')

Prepare inputs

Load the input image and estimate the corresponding depth map. Create camera matrices for the intrinsic and extrinsic parameters. ops.py has helper functions to create matrices.

from PIL import Image
from torchvision.transforms.functional import to_tensor

src_image = to_tensor(Image.open(image_file).convert('RGB'))[None].cuda()
src_depth = depth_estimator.infer(src_image)
import torch
from genwarp.ops import camera_lookat, get_projection_matrix

proj_mtx = get_projection_matrix(
    fovy=fovy,
    aspect_wh=1.,
    near=near,
    far=far
)

src_view_mtx = camera_lookat(
    torch.tensor([[0., 0., 0.]]),  # From (0, 0, 0)
    torch.tensor([[-1., 0., 0.]]), # Cast rays to -x
    torch.tensor([[0., 0., 1.]])   # z-up
)

tar_view_mtx = camera_lookat(
    torch.tensor([[-0.1, 2., 1.]]), # Camera eye position
    torch.tensor([[-5., 0., 0.]]),  # Looking at.
    z_up  # z-up
)

rel_view_mtx = (
    tar_view_mtx @ torch.linalg.inv(src_view_mtx.float())
).to(src_image)

Warping

Call the main function of GenWarp. And check the result.

renders = genwarp_nvs(
    src_image=src_image,
    src_depth=src_depth,
    rel_view_mtx=rel_view_mtx,
    src_proj_mtx=proj_mtx,
    tar_proj_mtx=proj_mtx
)

# Outputs.
renders['synthesized']     # Generated image.
renders['warped']          # Depth based warping image (for comparison).
renders['mask']            # Mask image (mask=1 where visible pixels).
renders['correspondence']  # Correspondence map.

Example notebook

We provide a complete example in genwarp_inference.ipynb

To access a Jupyter Notebook running in a docker container, you may need to use the host's network. For further details, please refer to the manual of Docker.

docker run --gpus=all -it --net host -v $(pwd):/workspace/genwarp -w /workspace/genwarp genwarp

Install requirements_dev.txt for additional packages to run the Jupyter Notebook.

Gradio live demo

An interactive live demo is also available. Start gradio demo by running the command below, and goto http://127.0.0.1:7860/ If you are running it on the server, be sure to forward the port 7860.

Or you can just visit Spaces hosted by Hugging Face to try it now.

python app.py

Citation

@article{seo2024genwarp,
  title={GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping},
  author={Junyoung Seo and Kazumi Fukuda and Takashi Shibuya and Takuya Narihira and Naoki Murata and Shoukang Hu and Chieh-Hsin Lai and Seungryong Kim and Yuki Mitsufuji},
  year={2024},
  journal={arXiv preprint arXiv:2405.17251},
}

Acknowledgements

Our codes are based on Moore-AnimateAnyone and other repositories it is based on. We thank the authors of relevant repositories and papers.