MVEdit

Official PyTorch implementation of the paper:

Generic 3D Diffusion Adapter Using Controlled Multi-View Editing
Hansheng Chen¹, Ruoxi Shi², Yulin Liu², Bokui Shen³, Jiayuan Gu², Gordon Wetzstein¹, Hao Su², Leonidas Guibas¹
¹Stanford University, ²UCSD, ³Apparate Labs

[project page] [Web UI] [Web UI🤗] [paper]

main_vid.mp4

Todos

Add Zero123++ v1.2 to the Web UI
Release the complete codebase, including the Web UI that can be deployed on your own machine
Add non-Gradio scripts and instructions

This project is a WIP. New models with better quality may be added in the future.

Installation

The code has been tested in the environment described as follows:

Linux (tested on Ubuntu 20 and above)
CUDA Toolkit 11.8 and above
PyTorch 2.1 and above
FFmpeg, x264 (optional, for exporting videos)

Other dependencies can be installed via pip install -r requirements.txt.

An example of installation commands is shown below (you may change the CUDA version yourself):

# Export the PATH of CUDA toolkit
export PATH=/usr/local/cuda-12.1/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-12.1/lib64:$LD_LIBRARY_PATH

# Create conda environment
conda create -y -n mvedit python=3.10
conda activate mvedit

# Install FFmpeg (optional)
conda install -c conda-forge ffmpeg x264

# Install PyTorch
conda install pytorch==2.1.2 torchvision==0.16.2 pytorch-cuda=12.1 -c pytorch -c nvidia

# Clone this repo and install other dependencies
git clone https://github.com/Lakonik/MVEdit && cd MVEdit
pip install -r requirements.txt

This codebase also works on Windows systems, but it has not been tested extensively. Please refer to Issue #8 for more information about the environment setup on Windows.

Usage

We recommend using the Gradio Web UI and its APIs. You need a GPU with at least 24GB of VRAM to run the Web UI.

Web UI

Run the following command to start the Web UI:

python app.py --advanced --empty-cache --unload-models

The Web UI will be available at http://localhost:7860. If you add the --share flag, a temporary public URL will be generated for you to share the Web UI with others.

All models will be automatically loaded on demand. The first run will take a very long time to download the models. Check your network connection to GitHub, Google Drive and Hugging Face if the download fails.

To view other options, run:

python app.py -h

API

After starting the Web UI, the API docs will be available at http://localhost:7860/?view=api. The docs are automatically generated by Gradio, and the data types and default values may be incorrect. Please use the default values in the Web UI as a reference.

An example of using the Zero123++ v1.2 image-to-3D API (without --advanced) is shown below:

import os
import shutil
import tqdm
from gradio_client import Client

in_dir = 'demo/examples_images'
out_dir = 'exp'
os.makedirs(out_dir, exist_ok=True)

client = Client('https://mvedit.hanshengchen.com/')  # Use your own URL here

for img_name in tqdm.tqdm(os.listdir(in_dir)):
    img_path = os.path.join(in_dir, img_name)
    seed = 42

    seg_result = client.predict(
        img_path,
        api_name='/image_segmentation')

    zero123_result = client.predict(
        seed,
        seg_result,
        api_name='/img_to_3d_1_2_zero123plus')

    # output path to the .glb mesh
    mvedit_result = client.predict(
        seed,
        seg_result,
        '',  # 'Prompt' Textbox component
        '',  # 'Negative prompt' Textbox component
        'DPMSolverMultistep',  # 'Sampling method' Dropdown component
        24,  # 'Sampling steps' Slider component
        0.5,  # 'Denoising strength' Slider component
        False,  # 'Random initialization' Checkbox component
        7,  # 'CFG scale' Slider component
        True,  # 'Texture super-resolution' Checkbox component
        'DPMSolverSDEKarras',  # 'Sampling method' Dropdown component (texture super-resolution)
        24,  # 'Sampling steps' Slider component (texture super-resolution)
        0.4,  # 'Denoising strength' Slider component (texture super-resolution)
        False,  # 'Random initialization' Checkbox component (texture super-resolution)
        7,  # 'CFG scale' Slider component (texture super-resolution)
        *zero123_result,
        api_name='/img_to_3d_1_2_zero123plus_to_mesh')

    shutil.move(mvedit_result, os.path.join(out_dir, os.path.splitext(img_name)[0] + '.glb'))

Direct Pipeline Usage

Instructions for advanced usage of MVEdit pipelines will be added soon.

Acknowledgements

This codebase is built upon the following repositories:

Base library modified from SSDNeRF
NeRF renderer and DMTet modified from Stable-DreamFusion
Mesh I/O modified from DreamGaussian
Zero123++ for image-to-3D initialization
IP-Adapter for extra conditioning
TRACER for background removal
LoFTR for pose estimation in image-to-3D
Omnidata for normal prediction in image-to-3D
Image Packer for mesh preprocessing

Citation

@misc{mvedit2024,
    title={Generic 3D Diffusion Adapter Using Controlled Multi-View Editing},
    author={Hansheng Chen and Ruoxi Shi and Yulin Liu and Bokui Shen and Jiayuan Gu and Gordon Wetzstein and Hao Su and Leonidas Guibas},
    year={2024},
    eprint={2403.12032},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}