/LEAP

[ICLR 2024] Code for LEAP: Liberate Sparse-view 3D Modeling from Camera Poses

Primary LanguagePython

LEAP: Liberate Sparse-view 3D Modeling from Camera Poses


LEAP: Liberate Sparse-view 3D Modeling from Camera Poses

Hanwen Jiang, Zhenyu Jiang, Yue Zhao, Qixing Huang

Installation

conda create --name leap python=3.9
conda activate leap

# Install pytorch or use your own torch version. We use pytorch 2.0.1
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia

# Install pytorch3d, please follow https://github.com/facebookresearch/pytorch3d/blob/main/INSTALL.md
# We use pytorch3d-0.7.4-py39_cu117_pyt201

# (Optional) Install flash attention to enable training on limited GPU memory
# We tested with flash attention 1.0.7
# Please follow https://github.com/Dao-AILab/flash-attention
# Using flash attention during training will lead to slightly worse performance
# If you don't want to install flash attention, please comment related code in encoder.py and lifting.py

pip install -r requirements.txt 

Pre-trained Weights

We provide the model weights trained on Omniobject3D dataset and Kubric ShapeNet dataset.

Run LEAP demo

  • Download pretrained weights, modify pre-trained weight path at L34 of demo.py.
  • Run with ./demo.sh.
  • You can try to capture your own images, and use segmented images as inputs.

Train LEAP

Download Dataset

  • Please follow Omniobject3D, FORGE and Zero123 to download the three object-centric datasets.
  • Please follow PixelNeRF to download DTU dataset.
  • Modify self.root in the dataloaders.

Training

  • Use ./train.sh and change your training config accordingly.
  • The default training configurations require about 300GB at most, e.g. 8 A40 GPUs with 40GB VRAM, each.
  • If you don't have enough resources, please consider using flash attention.

Evaluate LEAP

  • Use ./eval.sh and change your evaluation config accordingly.

Known Issues

  • The model trained on Omniobject3D cannot predict densities accurately on real images, please use Kubric pre-trained weights for real-image demo instead.
  • The model overfits the training intrinsics. For evaluation, please use the training intrinsics on any novel evaluation dataset.
  • The objaverse pre-trained model will not be released. We realized it requires at least 256 GPUs to make it converge well on objaverse. Besides, objaverse is highly noisy, containing many bad shapes. Please clean the samples before training.

Citation

@article{jiang2022LEAP,
   title={LEAP: Liberate Sparse-view 3D Modeling from Camera Poses},
   author={Jiang, Hanwen and Jiang, Zhenyu and Zhao, Yue and Huang, Qixing},
   journal={ArXiv},
   year={2023},
   volume={2310.01410}
}