/FollowYourHandle

[WACV 2025] Follow-Your-Handle: This repo is the official implementation of "MagicStick: Controllable Video Editing via Control Handle Transformations"

Your Image

MagicStick🪄: Controllable Video Editing via Control Handle Transformations

Yue Ma, Xiaodong Cun, Yingqing He, Chenyang Qi, Xintao Wang, Ying Shan, Xiu Li and Qifeng Chen

GitHub

"Zoom in the bird" "Move the parrot" "Zoom in the rabbit & rabbit ➜ tiger"

🎏 Abstract

TL; DR: MagicStick is the first unified framework to modify video properties(e.g., shape, size, location, motion) leveraging the keyframe transformations on the extracted internal control signals.

CLICK for the full abstract Text-based video editing has recently attracted considerable interest in changing the style or replacing the objects with a similar structure. Beyond this, we demonstrate that properties such as shape, size, location, motion, etc., can also be edited in videos. Our key insight is that the keyframe’s transformations of the specific internal feature (e.g., edge maps of objects or human pose), can easily propagate to other frames to provide generation guidance. We thus propose MagicStick, a controllable video editing method that edits the video properties by utilizing the transformation on the extracted internal control signals. In detail, to keep the appearance, we inflate both the pretrained image diffusion model and ControlNet to the temporal dimension and train low-rank adaptions (LORA) layers to fit the specific scenes. Then, in editing, we perform an inversion and editing framework. Differently, finetuned ControlNet is introduced in both inversion and generation for attention guidance with the proposed attention remix between the spatial attention maps of inversion and editing. Yet succinct, our method is the first method to show the ability of video property editing from the pre-trained text-to-image model. We present experiments on numerous examples within our unified framework. We also compare with shape-aware text-based editing and handcrafted motion video generation, demonstrating our superior temporal consistency and editing capability than previous works.

📋 Changelog

  • 2023.12.01 Release Code and Paper!

🚧 Todo

  • Release the edit config and data for all results, Tune-a-video optimization
  • Memory and runtime profiling and Editing guidance documents
  • Colab and hugging-face
  • code refactoring
  • time & memory optimization
  • Release more application

Object Size Editing

We show the difference between the source prompt and the target prompt in the box below each video.

Note mp4 and gif files in this GitHub page are compressed. Please check our Project Page for mp4 files of original video editing results.

Object Position Editing

Object Appearance Editing

"Truck ➜ Bus" "Truck ➜ Train"
"A swan ➜ A flamingo" "A swan ➜ A duck"

📀 Demo Video

demo.mp4

📍 Citation

If you think this project is helpful, please feel free to leave a star⭐️⭐️⭐️ and cite our paper:

@article{ma2023magicstick,
      title={MagicStick: Controllable Video Editing via Control Handle Transformations}, 
      author={Ma, Yue  and Cun, Xiaodong and He, Yingqing and Qi, Chenyang and Wang, Xintao and Shan, Ying and Li, Xiu and Chen, Qifeng},
      year={2023},
      journal={arXiv:2312.03047},
}

💗 Acknowledgements

This repository borrows heavily from FateZero and FollowYourPose. Thanks to the authors for sharing their code and models.

🧿 Maintenance

This is the codebase for our research work. We are still working hard to update this repo, and more details are coming in days. If you have any questions or ideas to discuss, feel free to contact Yue Ma or Xiaodong Cun.