/PIA

[CVPR 2024] PIA, your Personalized Image Animator. Animate your images by text prompt, combing with Dreambooth, achieving stunning videos. PIA,你的个性化图像动画生成器,利用文本提示将图像变为奇妙的动画

Primary LanguagePythonApache License 2.0Apache-2.0

CVPR 2024 | PIA:Personalized Image Animator

PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models

Yiming Zhang*, Zhening Xing*, Yanhong Zeng†, Youqing Fang, Kai Chen†

(*equal contribution, †corresponding Author)

arXiv Project Page Open in OpenXLab Third Party Colab HuggingFace Model Open in HugginFace Replicate

PIA is a personalized image animation method which can generate videos with high motion controllability and strong text and image alignment.

If you find our project helpful, please give it a star ⭐ or cite it, we would be very grateful 💖 .

What's New

  • 2024/01/03 Replicate Demo & API support!
  • 2024/01/03 Colab support from camenduru!
  • 2023/12/28 Support scaled_dot_product_attention for 1024x1024 images with just 16GB of GPU memory.
  • 2023/12/25 HuggingFace demo is available now! 🤗 Hub
  • 2023/12/22 Release the demo of PIA on OpenXLab and checkpoints on Google Drive or Open in OpenXLab

Setup

Prepare Environment

Use the following command to install a conda environment for PIA from scratch:

conda env create -f pia.yml
conda activate pia

You may also want to install it based on an existing environment, then you can use environment-pt2.yaml for Pytorch==2.0.0. If you want to use lower version of Pytorch (e.g. 1.13.1), you can use the following command:

conda env create -f environment.yaml
conda activate pia

We strongly recommend you to use Pytorch==2.0.0 which supports scaled_dot_product_attention for memory-efficient image animation.

Download checkpoints

  • Download the Stable Diffusion v1-5
  • conda install git-lfs
    git lfs install
    git clone https://huggingface.co/runwayml/stable-diffusion-v1-5 models/StableDiffusion/
    
  • Download PIA
  • git clone https://huggingface.co/Leoxing/PIA models/PIA/
    
  • Download Personalized Models
  • bash download_bashscripts/1-RealisticVision.sh
    bash download_bashscripts/2-RcnzCartoon.sh
    bash download_bashscripts/3-MajicMix.sh
    

    You can also download pia.ckpt manually through link on Google Drive or HuggingFace.

    Put checkpoints as follows:

    └── models
        ├── DreamBooth_LoRA
        │   ├── ...
        ├── PIA
        │   ├── pia.ckpt
        └── StableDiffusion
            ├── vae
            ├── unet
            └── ...
    

    Inference

    Image Animation

    Image to Video result can be obtained by:

    python inference.py --config=example/config/lighthouse.yaml
    python inference.py --config=example/config/harry.yaml
    python inference.py --config=example/config/majic_girl.yaml
    

    Run the command above, then you can find the results in example/result:

    Input Image

    lightning, lighthouse

    sun rising, lighthouse

    fireworks, lighthouse

    Input Image

    1boy smiling

    1boy playing the magic fire

    1boy is waving hands

    Input Image

    1girl is smiling

    1girl is crying

    1girl, snowing

    Motion Magnitude

    You can control the motion magnitude through the parameter magnitude:

    python inference.py --config=example/config/xxx.yaml --magnitude=0 # Small Motion
    python inference.py --config=example/config/xxx.yaml --magnitude=1 # Moderate Motion
    python inference.py --config=example/config/xxx.yaml --magnitude=2 # Large Motion

    Examples:

    python inference.py --config=example/config/labrador.yaml
    python inference.py --config=example/config/bear.yaml
    python inference.py --config=example/config/genshin.yaml

    Input Image
    & Prompt

    Small Motion

    Moderate Motion

    Large Motion

    a golden labrador is running
    1bear is walking, ...
    cherry blossom, ...

    Style Transfer

    To achieve style transfer, you can run the command(Please don't forget set the base model in xxx.yaml):

    Examples:

    python inference.py --config example/config/concert.yaml --style_transfer
    python inference.py --config example/config/anya.yaml --style_transfer

    Input Image
    & Base Model

    1man is smiling

    1man is crying

    1man is singing

    Realistic Vision
    RCNZ Cartoon 3d

    1girl smiling

    1girl open mouth

    1girl is crying, pout

    RCNZ Cartoon 3d

    Loop Video

    You can generate loop by using the parameter --loop

    python inference.py --config=example/config/xxx.yaml --loop

    Examples:

    python inference.py --config=example/config/lighthouse.yaml --loop
    python inference.py --config=example/config/labrador.yaml --loop

    Input Image

    lightning, lighthouse

    sun rising, lighthouse

    fireworks, lighthouse

    Input Image

    labrador jumping

    labrador walking

    labrador running

    Training

    We provide training script for PIA. It borrows from AnimateDiff heavily, so please prepare the dataset and configuration files according to the guideline.

    After preparation, you can train the model by running the following command using torchrun:

    torchrun --nnodes=1 --nproc_per_node=1 train.py --config example/config/train.yaml

    or by slurm,

    srun --quotatype=reserved --job-name=pia --gres=gpu:8 --ntasks-per-node=8 --ntasks=8  --cpus-per-task=4 --kill-on-bad-exit=1 python train.py --config example/config/train.yaml

    AnimateBench

    We have open-sourced AnimateBench on HuggingFace which includes images, prompts and configs to evaluate PIA and other image animation methods.

    BibTex

    @inproceedings{zhang2024pia,
      title={Pia: Your personalized image animator via plug-and-play modules in text-to-image models},
      author={Zhang, Yiming and Xing, Zhening and Zeng, Yanhong and Fang, Youqing and Chen, Kai},
      booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
      pages={7747--7756},
      year={2024}
    }
    

    Contact Us

    Yiming Zhang: zhangyiming@pjlab.org.cn

    Zhening Xing: xingzhening@pjlab.org.cn

    Yanhong Zeng: zengyanhong@pjlab.org.cn

    Acknowledgements

    The code is built upon AnimateDiff, Tune-a-Video and PySceneDetect

    You may also want to try other project from our team: MMagic