/MotionClone

Official implementation of MotionClone: Training-Free Motion Cloning for Controllable Video Generation

Primary LanguagePython

MotionClone

This repository is the official implementation of MotionClone. It is a training-free framework that enables motion cloning from a reference video for controllable text-to-video generation.

Click for the full abstract of MotionClone

We propose MotionClone, a training-free framework that enables motion cloning from a reference video to control text-to-video generation. We employ temporal attention in video inversion to represent the motions in the reference video and introduce primary temporal-attention guidance to mitigate the influence of noisy or very subtle motions within the attention weights. Furthermore, to assist the generation model in synthesizing reasonable spatial relationships and enhance its prompt-following capability, we propose a location-aware semantic guidance mechanism that leverages the coarse location of the foreground from the reference video and original classifier-free guidance features to guide the video generation.

MotionClone: Training-Free Motion Cloning for Controllable Video Generation
Pengyang Ling*, Jiazi Bu*, Pan Zhang, Xiaoyi Dong, Yuhang Zang, Tong Wu, Huaian Chen, Jiaqi Wang, Yi Jin
(*Equal Contribution)(Corresponding Author)

arXiv Project Page

teaser

🖋 News

  • The latest version of our paper (v3) is available on arXiv! (7.2)
  • Code released! (6.29)

🏗️ Todo

  • Release Gradio demo
  • Release the MotionClone code (We have released the first version of our code and will continue to optimize it. We welcome any questions or issues you may have and will address them promptly.)
  • Release paper

📚 Gallery

We show more results in the Project Page.

🚀 Method Overview

As illustrated in the framework above, MotionClone comprises two core components in its guidance stage: Primary Temporal-Attention Guidance and Location-Aware Semantic Guidance, which operate synergistically to provide comprehensive motion and semantic guidance for controllable video generation.

🔧 Installations (python==3.11.3 recommended)

Setup repository and conda environment

git clone https://github.com/Bujiazi/MotionClone.git
cd MotionClone

conda env create -f environment.yaml
conda activate motionclone

🔑 Pretrained Model Preparations

Download Stable Diffusion V1.5

git lfs install
git clone https://huggingface.co/runwayml/stable-diffusion-v1-5 models/StableDiffusion/

After downloading Stable Diffusion, save them to models/StableDiffusion.

Prepare Community Models

Manually download the community .safetensors models from RealisticVision V5.1 and save them to models/DreamBooth_LoRA.

Prepare AnimateDiff Motion Modules

Manually download the AnimateDiff modules from AnimateDiff, we recommend v3_adapter_sd_v15.ckpt and v3_sd15_mm.ckpt.ckpt. Save the modules to models/Motion_Module.

🎈 Quick Start

Perform DDIM Inversion

python invert.py --config configs/inference_config/fox.yaml

Perform Motion Cloning

python sample.py --config configs/inference_config/fox.yaml

📎 Citation

If you find this work helpful, please cite the following paper:

@article{ling2024motionclone,
  title={MotionClone: Training-Free Motion Cloning for Controllable Video Generation},
  author={Ling, Pengyang and Bu, Jiazi and Zhang, Pan and Dong, Xiaoyi and Zang, Yuhang and Wu, Tong and Chen, Huaian and Wang, Jiaqi and Jin, Yi},
  journal={arXiv preprint arXiv:2406.05338},
  year={2024}
}

📣 Disclaimer

This is official code of MotionClone. All the copyrights of the demo images and audio are from community users. Feel free to contact us if you would like remove them.

💞 Acknowledgements

The code is built upon the below repositories, we thank all the contributors for open-sourcing.

🌟 Star History

Star History Chart