/PatchFusion

An End-to-End Tile-Based Framework for High-Resolution Monocular Metric Depth Estimation

Primary LanguagePythonMIT LicenseMIT

PatchFusion

An End-to-End Tile-Based Framework
for High-Resolution Monocular Metric Depth Estimation

Website Paper Hugging Face Space Hugging Face Model License: MIT

[Project Website] | [Arxiv Paper]
Zhenyu Li, Shariq Farooq Bhat, Peter Wonka.
KAUST

DEMO

Our offical demo is available here! Thanks for the kind support from hysts!

We also provide an experimental demo (with 80G memory) for you to play around. Note this link would be expired soon.

Environment setup

The project depends on :

  • pytorch (Main framework)
  • timm (Backbone helper for MiDaS)
  • ZoeDepth (Main baseline)
  • ControlNet (For potential application)
  • pillow, matplotlib, scipy, h5py, opencv (utilities)

Install environment using environment.yml :

Using mamba (fastest):

mamba (or micromamba) env create -n patchfusion --file environment.yml
mamba (or micromamba) activate patchfusion

Using conda :

conda env create -n patchfusion --file environment.yml
conda activate patchfusion

Pre-Train Model

Download our pre-trained model here, and put this checkpoint at nfs/patchfusion_u4k.pt as preparation for the following steps.

If you want to play the ControlNet demo, please download the pre-trained ControlNet model here, and put this checkpoint at nfs/control_sd15_depth.pth.

Gradio Demo

We provide a UI demo built using gradio. To get started, install UI requirements:

pip install -r ui_requirements.txt

Launch the gradio UI for depth estimation or image to 3D:

python ./ui_prediction.py --model zoedepth_custom --ckp_path nfs/patchfusion_u4k.pt --model_cfg_path ./zoedepth/models/zoedepth_custom/configs/config_zoedepth_patchfusion.json

Launch the gradio UI for depth-guided image generation with ControlNet:

python ./ui_generative.py --model zoedepth_custom --ckp_path nfs/patchfusion_u4k.pt --model_cfg_path ./zoedepth/models/zoedepth_custom/configs/config_zoedepth_patchfusion.json

User Inference

  1. Put your images in folder path/to/your/folder

  2. Run codes:

    python ./infer_user.py --model zoedepth_custom --ckp_path nfs/patchfusion_u4k.pt --model_cfg_path ./zoedepth/models/zoedepth_custom/configs/config_zoedepth_patchfusion.json --rgb_dir path/to/your/folder --show --show_path path/to/show --save --save_path path/to/save --mode r128 --boundary 0 --blur_mask
  3. Check visualization results in path/to/show and depth results in path/to/save, respectively.

Args

  • We recommand users to use --blur_mask to reduce patch artifacts, though we didn't use it in our standard evaluation process.
  • --mode: select from p16, p49, and rn, where n is the number of random added patches.
  • Please refer to infer_user.py for more details.

Citation

If you find our work useful for your research, please consider citing the paper

@article{li2023patchfusion,
    title={PatchFusion: An End-to-End Tile-Based Framework for High-Resolution Monocular Metric Depth Estimation}, 
    author={Zhenyu Li and Shariq Farooq Bhat and Peter Wonka},
    year={2023},
    eprint={2312.02284},
    archivePrefix={arXiv},
    primaryClass={cs.CV}}