This repository represents the official implementation of the paper titled "Learning Temporally Consistent Video Depth from Video Diffusion Priors".
Jiahao Shao*, Yuanbo Yang*, Hongyu Zhou, Youmin Zhang, Yujun Shen, Vitor Guizilini, Yue Wang, Matteo Poggi, Yiyi Liao
2024-12-03: Release inference code and checkpoint for new version
2024-06-11: Added - try it out with your videos for free!
2024-06-11: Added paper and inference code (this repository).
We test our codes under the following environment: Ubuntu 22.04, Python 3.10.15, CUDA 12.1, RTX A6000
.
- Clone this repository.
git clone https://github.com/jhaoshao/ChronoDepth
cd ChronoDepth
- Install packages
conda create -n chronodepth python=3.10 -y
conda activate chronodepth
pip install -r requirements.txt
Run the python script run_infer.py
as follows
python run_infer.py \
--unet=jhshao/ChronoDepth-v1 \
--model_base=stabilityai/stable-video-diffusion-img2vid-xt \
--seed=1234 \
--data_dir=assets/elephant.mp4 \
--output_dir=./outputs \
--denoise_steps=5 \
--chunk_size=5 \
--n_tokens=10 \
--sigma_epsilon=-4.0
Some important inference settings below:
--denoise_steps
: the number of steps for the denoising process.--chunk_size
: chunk size of sliding window for sliding window inference.--n_tokens
: number of frames of each clip for sliding window inference.--sigma_epsilon
: hyperparameter for our context-aware diffusion denoising.
- Release inference code and checkpoint for new version
- Set up Online demo for new version
- Release evaluation code
- Release training code & dataset preparation
Please cite our paper if you find this repository useful:
@misc{shao2024learningtemporallyconsistentvideo,
title={Learning Temporally Consistent Video Depth from Video Diffusion Priors},
author={Jiahao Shao and Yuanbo Yang and Hongyu Zhou and Youmin Zhang and Yujun Shen and Vitor Guizilini and Yue Wang and Matteo Poggi and Yiyi Liao},
year={2024},
eprint={2406.01493},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2406.01493},
}