MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion, (NeurIPS2023, spotlight)

Project page | Paper | Demo

Citation

If you use our work in your research, please cite it as follows:

@article{tang2023MVDiffusion,
  title={MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion},
  author={Tang, Shitao and Zhang, Fuayng and Chen, Jiacheng and Wang, Peng and Yasutaka, Furukawa},
  journal={arXiv preprint 2307.01097},
  year={2023}
}

Updates: MVDiffusion is able to extrapolate a single perspective image into a 360-degree view panorama. The paper has been updated.

Installation

Install the necessary packages by running the following command:

pip install -r requirements.txt

Model Zoo

We provide baseline results and models for the following:

Please put those files in 'MVDiffusion/weights'.

Demo

Test the demo by running:

Text conditioned generation

python demo.py --text "This kitchen is a charming blend of rustic and modern, featuring a large reclaimed wood island with marble countertop, a sink surrounded by cabinets. To the left of the island, a stainless-steel refrigerator stands tall. To the right of the sink, built-in wooden cabinets painted in a muted."

Dual contioned generation

python demo.py --text_path assets/prompts.txt --image_path assets/outpaint_example.png

Data

Panorama generation, please download data from matterport3D skybox data and labels.

├── data
    ├── mp3d_skybox
      ├── train.npy
      ├── test.npy
      ├── 5q7pvUzZiYa
        ├──blip3
        ├──matterport_skybox_images
      ├── 1LXtFkjw3qL
      ├── ....

Depth conditioned generation, please download data from scannet, training labels, and testing labels.

├── data
    ├── scannet
      ├── train
        ├── scene0435_01
          ├── color
          ├── depth
          ├── intrinsic
          ├── pose
          ├── prompt
          ├── key_frame_0.6.txt
          ├── valid_frames.npy
      ├── test

Testing

Execute the following scripts for testing:

sh test_pano.sh: Generate 8 multi-view panoramic images in the Matterport3D testing dataset.
sh test_pano_outpaint.sh: Generate 8 multi-view images conditioned on a single view image (outpaint) in the Matterport3D testing dataset.
sh test_depth_fix_frames.sh: Generate 12 depth-conditioned images in the ScanNet testing dataset.
sh test_depth_fix_interval.sh: Generate a sequence of depth-conditioned images (every 20 frames) in the ScanNet testing dataset.
sh test_depth_two_stage.sh: Generate a sequence of depth-conditioned images (key frames), and interpolate the in-between images, in the ScanNet testing dataset.

After running either sh test_depth_fix_interval.sh or sh test_depth_two_stage.sh, you can use TSDF fusion to get textured mesh.

Training

Execute the following scripts for training:

sh train_pano.sh: Train the panoramic image generation model.
sh train_pano_outpaint.sh: Train the panoramic image outpaint model.
sh train_depth.sh: Train the depth conditioned generation model.

Custom data

Panorama generation:

Convert the panorama into 6 skybox images using the provided tool, Equirec2Perspec. You will get left, right, front, back, up, and down images.
Convert the panorama to 8 perspective images. Each image will capture a 45-degree horizontal view. Four of these images will overlap with the skybox images, specifically the left, right, front, and back views.
Once you have the perspective images, you can use BLIP2 to generate prompts from them.

Multi-view Depth-to-Image Generation:

Using Scannet Format: For this, you would typically follow the structure and format of the Scannet dataset.
use BLIP2 to generate prompts from each perspective image.

License

This project is licensed under the terms of the MIT license.

Contact

For any questions, feel free to contact us at [shitaot@sfu.ca].

Tangshitao/MVDiffusion