Single Image to 3D using Cross-Domain Diffusion
Wonder3D reconstructs highly-detailed textured meshes from a single-view image in only 2 ∼ 3 minutes. Wonder3D first generates consistent multi-view normal maps with corresponding color images via a cross-domain diffusion model, and then leverages a novel normal fusion method to achieve fast and high-quality reconstruction.
If you get any interesting reconstructions and would like to share with others, welcome to upload the input image and reconstructed mesh to this onedrive repo.
Data structure:
{yourname}/{scenename}-input.png # the input image
{yourname}/{scenename}-screenshot.png # a front view screenshot of the reconstructed mesh
{yourname}/{scenename}-mesh.obj # the reconstructed mesh, .obj or .ply
# example:
# create a folder name `xxlong`, then upload the files to the folder
xxlong/apple-input.png
xxlong/apple-screenshot.png
xxlong/apple-mesh.obj
Our overarching mission is to enhance the speed, affordability, and quality of 3D AIGC, making the creation of 3D content accessible to all. While significant progress has been achieved in the recent years, we acknowledge there is still a substantial journey ahead. We enthusiastically invite you to engage in discussions and explore potential collaborations in any capacity. If you're interested in connecting or partnering with us, please don't hesitate to reach out via email (xxlong@connect.hku.hk) .
The repo is still being under construction, thanks for your patience.
- Local gradio demo.
- Detailed tutorial.
- GUI demo for mesh reconstruction
- Windows support
- Docker support
- Inference code and pretrained models.
- Huggingface demo.
- New model trained on the whole Objaverse dataset.
- Install packages in
requirements.txt
(Linux system).
conda create -n wonder3d
conda activate wonder3d
pip install -r requirements.txt
pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
Windows System. Thank @fefespn for his help! (Not tested. May refer to issues #12 #29 )
conda create --name venv_wonder3d -y python=3.8
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
conda install -c "nvidia/label/cuda-11.8.0" cuda-toolkit
pip install fire diffusers==0.19.3 transformers bitsandbytes accelerate gradio rembg segment_anything
pip install einops omegaconf pytorch-lightning==1.9.5 torch_efficient_distloss nerfacc==0.3.3 PyMCubes trimesh
pip install https://download.pytorch.org/whl/cu118/xformers-0.0.22.post4%2Bcu118-cp38-cp38-manylinux2014_x86_64.whl
pip install ninja git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
- Download the checkpoints and into the root folder.
- Make sure you have the following models.
Wonder3D
|-- ckpts
|-- unet
|-- scheduler.bin
...
- Predict foreground mask as the alpha channel. We use Clipdrop to segment the foreground object interactively.
You may also use
rembg
to remove the backgrounds.
# !pip install rembg
import rembg
result = rembg.remove(result)
result.show()
- Run Wonder3d to produce multiview-consistent normal maps and color images. Then you can check the results in the folder
./outputs
. (we userembg
to remove backgrounds of the results, but the segmentations are not always perfect. May consider using Clipdrop to get masks for the generated normal maps and color images, since the quality of masks will significantly influence the reconstructed mesh quality.)
accelerate launch --config_file 1gpu.yaml test_mvdiffusion_seq.py \
--config mvdiffusion-joint-ortho-6views.yaml
or
bash run_test.sh
python gradio_app.py
- Mesh Extraction
cd ./instant-nsr-pl
bash run.sh output_folder_path scene_name
Our generated normals and color images are defined in orthographic views, so the reconstructed mesh is also in orthographic camera space. If you use MeshLab to view the meshes, you can click Toggle Orthographic Camera
in View
tab.
Since there are many complaints about the Windows setup of instant-nsr-pl, we provide the NeuS-based reconstruction, which may get rid of the requirement problems.
NeuS consumes less GPU memory and favors smooth surfaces without parameters tuning. However, NeuS consumes more times and its texture may be less sharp. If you are not sensitive to time, we recommend NeuS for optimization due to its robustness.
cd ./NeuS
bash run.sh output_folder_path scene_name
Q: The evelation and azimuth degrees of the generated views?
A: Unlike that the prior works such as Zero123, SyncDreamer and One2345 adopt object world system, our views are defined in the camera system of the input image. The six views are in the plane with 0 elevation degree in the camera system of the input image. Therefore we don't need to estimate an elevation degree for input image. The azimuth degrees of the six views are 0, 45, 90, 180, -90, -45 respectively.
Q: The focal length of the generated views?
A: We assume the input images are captured by orthographic camera, so the generated views are also in orthographic space. This design enables our model to keep strong generlaization on unreal images, but sometimes it may suffer from focal lens distortions on real-captured images.
We have intensively borrow codes from the following repositories. Many thanks to the authors for sharing their codes.
If you find this repository useful in your project, please cite the following work. :)
@misc{long2023wonder3d,
title={Wonder3D: Single Image to 3D using Cross-Domain Diffusion},
author={Xiaoxiao Long and Yuan-Chen Guo and Cheng Lin and Yuan Liu and Zhiyang Dou and Lingjie Liu and Yuexin Ma and Song-Hai Zhang and Marc Habermann and Christian Theobalt and Wenping Wang},
year={2023},
eprint={2310.15008},
archivePrefix={arXiv},
primaryClass={cs.CV}
}