/ChronoDepth

ChronoDepth: Learning Temporally Consistent Video Depth from Video Diffusion Priors

Primary LanguagePythonMIT LicenseMIT

ChronoDepth: Learning Temporally Consistent Video Depth from Video Diffusion Priors

This repository represents the official implementation of the paper titled "Learning Temporally Consistent Video Depth from Video Diffusion Priors".

Website Paper Hugging Face Space Hugging Face Model

Jiahao Shao*, Yuanbo Yang*, Hongyu Zhou, Youmin Zhang, Yujun Shen, Vitor Guizilini, Yue Wang, Matteo Poggi, Yiyi Liao

📢 News

2024-12-03: Release inference code and checkpoint for new version
2024-06-11: Added - try it out with your videos for free!
2024-06-11: Added paper and inference code (this repository).

🛠️ Setup

We test our codes under the following environment: Ubuntu 22.04, Python 3.10.15, CUDA 12.1, RTX A6000.

  1. Clone this repository.
git clone https://github.com/jhaoshao/ChronoDepth
cd ChronoDepth
  1. Install packages
conda create -n chronodepth python=3.10 -y
conda activate chronodepth
pip install -r requirements.txt

🚀 Run inference

Run the python script run_infer.py as follows

python run_infer.py \
    --unet=jhshao/ChronoDepth-v1 \
    --model_base=stabilityai/stable-video-diffusion-img2vid-xt \
    --seed=1234 \
    --data_dir=assets/elephant.mp4 \
    --output_dir=./outputs \
    --denoise_steps=5 \
    --chunk_size=5 \
    --n_tokens=10 \
    --sigma_epsilon=-4.0

Some important inference settings below:

  • --denoise_steps: the number of steps for the denoising process.
  • --chunk_size: chunk size of sliding window for sliding window inference.
  • --n_tokens: number of frames of each clip for sliding window inference.
  • --sigma_epsilon: hyperparameter for our context-aware diffusion denoising.

✅ TODO

  • Release inference code and checkpoint for new version
  • Set up Online demo for new version
  • Release evaluation code
  • Release training code & dataset preparation

🎓 Citation

Please cite our paper if you find this repository useful:

@misc{shao2024learningtemporallyconsistentvideo,
      title={Learning Temporally Consistent Video Depth from Video Diffusion Priors}, 
      author={Jiahao Shao and Yuanbo Yang and Hongyu Zhou and Youmin Zhang and Yujun Shen and Vitor Guizilini and Yue Wang and Matteo Poggi and Yiyi Liao},
      year={2024},
      eprint={2406.01493},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2406.01493}, 
}