ChronoDepth: Learning Temporally Consistent Video Depth from Video Diffusion Priors

This repository represents the official implementation of the paper titled "Learning Temporally Consistent Video Depth from Video Diffusion Priors".

Jiahao Shao*, Yuanbo Yang*, Hongyu Zhou, Youmin Zhang, Yujun Shen, Matteo Poggi, Yiyi Liao†

📢 News

2024-06-11: Added - try it out with your videos for free!
2024-06-11: Added paper and inference code (this repository).

🛠️ Setup

We test our codes under the following environment: Ubuntu 20.04, Python 3.10.14, CUDA 11.3, RTX A6000.

Clone this repository.

git clone https://github.com/jhaoshao/ChronoDepth
cd ChronoDepth

Install packages

conda create -n chronodepth python=3.10
conda activate chronodepth
pip install -r requirements.txt

🚀 Run inference

Run the python script run_infer.py as follows

python run_infer.py \
    --model_base=jhshao/ChronoDepth \
    --data_dir=assets/sora_e2.mp4 \
    --output_dir=./outputs \
    --num_frames=10 \
    --denoise_steps=10 \
    --window_size=9 \
    --half_precision \
    --seed=1234 \

Inference settings:

--num_frames: sets the number of frames for each video clip.
--denoise_steps: sets the number of steps for the denoising process.
--window_size: sets the size of sliding window. This implies conducting separate inference when the sliding window size equals the number of frames.
--half_precision: enables running with half-precision (16-bit float). It might lead to suboptimal result but could speed up the inference process.

🎓 Citation

Please cite our paper if you find this repository useful:

@misc{shao2024learning,
      title={Learning Temporally Consistent Video Depth from Video Diffusion Priors}, 
      author={Jiahao Shao and Yuanbo Yang and Hongyu Zhou and Youmin Zhang and Yujun Shen and Matteo Poggi and Yiyi Liao},
      year={2024},
      eprint={2406.01493},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}