/MotionFollower

MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion

Primary LanguagePython

MotionFollower

This repository is the official implementation of MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion.

MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion
Shuyuan Tu, Qi Dai, Zihao Zhang, Sicheng Xie, Zhi-Qi Cheng, Chong Luo, Xintong Han, Zuxuan Wu, Yu-Gang Jiang

License Project Website arXiv

demo.mp4
  Source   Target MotionEditor MotionFollower


MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion

News

  • 🌟 [June, 2024] We release the inference code of MotionFollower. The codes of training details, our proposed mask prediction model and data preprocessing details will be available to the public as soon as possible. Please stay tuned!

Abstract

Despite impressive advancements in diffusion-based video editing models in altering video attributes, there has been limited exploration into modifying motion information while preserving the original protagonist's appearance and background. In this paper, we propose MotionFollower, a lightweight score-guided diffusion model for video motion editing. To introduce conditional controls to the denoising process, MotionFollower leverages two of our proposed lightweight signal controllers, one for poses and the other for appearances, both of which consist of convolution blocks without involving heavy attention calculations. Further, we design a score guidance principle based on a two-branch architecture, including the reconstruction and editing branches, which significantly enhance the modeling capability of texture details and complicated backgrounds. Concretely, we enforce several consistency regularizers and losses during the score estimation. The resulting gradients thus inject appropriate guidance to the intermediate latents, forcing the model to preserve the original background details and protagonists' appearances without interfering with the motion modification. Experiments demonstrate the competitive motion editing ability of MotionFollower qualitatively and quantitatively. Compared with MotionEditor, the most advanced motion editing model, MotionFollower achieves an approximately 80% reduction in GPU memory while delivering superior motion editing performance and exclusively supporting large camera movements and actions.

Setup

Requirements

pip install -r requirements.txt

Installing xformers is highly recommended for more efficiency and speed on GPUs. To enable xformers, set enable_xformers_memory_efficient_attention=True (default).

Weights

[Stable Diffusion] Stable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input. The pre-trained Stable Diffusion model can be downloaded from Hugging Face. The pre-trained VAE model can be downloaded from Hugging Face. [MotionFollower] MotionFollower is a lightweight score-guided diffusion model for video motion editing. The checkpoints are available in the Google Drive.

Inference

For case-i(i=[1,2,3,4,5]):

python inference-sde.py --config /path/configs/inference/inferece_sde.yaml --video_root ./configs/inference/case-i/source_images --pose_root ./configs/inference/case-i/target_aligned_poses --ref_pose_root ./configs/inference/case-i/source_poses --source_mask_root ./configs/inference/case-i/source_masks --target_mask_root ./configs/inference/case-i/predicted_masks --cfg 7.0

For case-k[k=6,7]:

python inference-sde.py --config /path/configs/inference/inferece_sde.yaml --video_root ./configs/inference/case-k/source_images --pose_root ./configs/inference/case-k/target_aligned_poses --ref_pose_root ./configs/inference/case-k/source_poses --source_mask_root ./configs/inference/case-k/source_masks --target_mask_root ./configs/inference/case-k/predicted_masks --camera True

If the source image file extension is png, we need to add --suffix png in the command line.

Contact

If you have any suggestions or find our work helpful, feel free to contact us

Email: francisshuyuan@gmail.com

If you find our work useful, please consider citing it:

@article{tu2024motionfollower,
  title={MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion},
  author={Shuyuan Tu and Qi Dai and Zihao Zhang and Sicheng Xie and Zhi-Qi Cheng and Chong Luo and Xintong Han and Zuxuan Wu and Yu-Gang Jiang},
  journal={arXiv preprint arXiv:2405.20325},
  year={2024}
}