/Make-Your-Video

[IEEE TVCG 2024] Customized Video Generation Using Textual and Structural Guidance

Primary LanguagePythonOtherNOASSERTION

Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance

           

Jinbo Xing, Menghan Xia*, Yuxin Liu, Yuechen Zhang, Yong Zhang, Yingqing He, Hanyuan Liu,
Haoxin Chen, Xiaodong Cun, Xintao Wang, Ying Shan, Tien-Tsin Wong


(* corresponding author)

From CUHK and Tencent AI Lab.

IEEE TVCG 2024

๐Ÿ”† Introduction

Make-Your-Video is a customized video generation model with both text and motion structure (depth) control. It inherits rich visual concepts from image LDM and supports longer video inference.

๐Ÿค— Applications

Real-life scene to video

Real-life scene Ours Text2Video-zero+CtrlNet LVDMExt+Adapter
"A dam discharging water"
"A futuristic rocket ship on a launchpad, with sleek design, glowing lights"

3D scene modeling to video

Real-life scene Ours Text2Video-zero+CtrlNet LVDMExt+Adapter
"A train on the rail, 2D cartoon style"
"A Van Gogh style painting on drawing board in park, some books on the picnic blanket, photorealistic"
"A Chinese ink wash landscape painting"

Video re-rendering

Original video Ours SD-Depth Text2Video-zero+CtrlNet LVDMExt+Adapter Tune-A-Video
"A tiger walks in the forest, photorealistic"
"An origami boat moving on the sea"
"A camel walking on the snow field, Miyazaki Hayao anime style"

๐ŸŒŸ Method Overview

๐Ÿ“ Changelog

  • [2023.11.30]: ๐Ÿ”ฅ๐Ÿ”ฅ Release the main model.
  • [2023.06.01]: ๐Ÿ”ฅ๐Ÿ”ฅ Create this repo and launch the project webpage.

๐Ÿงฐ Models

Model Resolution Checkpoint
MakeYourVideo256 256x256 Hugging Face

It takes approximately 13 seconds and requires a peak GPU memory of 20 GB to animate an image using a single NVIDIA A100 (40G) GPU.

โš™๏ธ Setup

Install Environment via Anaconda (Recommended)

conda create -n makeyourvideo python=3.8.5
conda activate makeyourvideo
pip install -r requirements.txt

๐Ÿ’ซ Inference

1. Command line

  1. Download the pre-trained depth estimation model from Hugging Face, and put the dpt_hybrid-midas-501f0c75.pt in checkpoints/depth/dpt_hybrid-midas-501f0c75.pt.
  2. Download pretrained models via Hugging Face, and put the model.ckpt in checkpoints/makeyourvideo_256_v1/model.ckpt.
  3. Input the following commands in terminal.
  sh scripts/run.sh

๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ Other Interesting Open-source Projects

VideoCrafter1: Framework for high-quality video generation.

DynamiCrafter: Open-domain image animation methods using video diffusion priors.

Play with these projects in the same conda environement!

๐Ÿ˜‰ Citation

@article{xing2023make,
  title={Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance},
  author={Xing, Jinbo and Xia, Menghan and Liu, Yuxin and Zhang, Yuechen and Zhang, Yong and He, Yingqing and Liu, Hanyuan and Chen, Haoxin and Cun, Xiaodong and Wang, Xintao and others},
  journal={arXiv preprint arXiv:2306.00943},
  year={2023}
}

๐Ÿ“ข Disclaimer

We develop this repository for RESEARCH purposes, so it can only be used for personal/research/non-commercial purposes.


๐ŸŒž Acknowledgement

We gratefully acknowledge the Visual Geometry Group of University of Oxford for collecting the WebVid-10M dataset and follow the corresponding terms of access.