ChronoDepth: Learning Temporally Consistent Video Depth from Video Diffusion Priors

This repository represents the official implementation of the paper titled "Learning Temporally Consistent Video Depth from Video Diffusion Priors".

Jiahao Shao*, Yuanbo Yang*, Hongyu Zhou, Youmin Zhang, Yujun Shen, Vitor Guizilini, Yue Wang, Matteo Poggi, Yiyi Liao

📢 News

2024-12-03: Release inference code and checkpoint for new version
2024-06-11: Added - try it out with your videos for free!
2024-06-11: Added paper and inference code (this repository).

🛠️ Setup

We test our codes under the following environment: Ubuntu 22.04, Python 3.10.15, CUDA 12.1, RTX A6000.

Clone this repository.

git clone https://github.com/jhaoshao/ChronoDepth
cd ChronoDepth

Install packages

conda create -n chronodepth python=3.10 -y
conda activate chronodepth
pip install -r requirements.txt

🚀 Run inference

Run the python script run_infer.py as follows

python run_infer.py \
    --unet=jhshao/ChronoDepth-v1 \
    --model_base=stabilityai/stable-video-diffusion-img2vid-xt \
    --seed=1234 \
    --data_dir=assets/elephant.mp4 \
    --output_dir=./outputs \
    --denoise_steps=5 \
    --chunk_size=5 \
    --n_tokens=10 \
    --sigma_epsilon=-4.0

Some important inference settings below:

--denoise_steps: the number of steps for the denoising process.
--chunk_size: chunk size of sliding window for sliding window inference.
--n_tokens: number of frames of each clip for sliding window inference.
--sigma_epsilon: hyperparameter for our context-aware diffusion denoising.

✅ TODO

Release inference code and checkpoint for new version
Set up Online demo for new version
Release evaluation code
Release training code & dataset preparation

🎓 Citation

Please cite our paper if you find this repository useful: