/sd-webui-animatediff

AnimateDiff for AUTOMATIC1111 Stable Diffusion WebUI

Primary LanguagePython

AnimateDiff for Stable Diffusion Webui

This extension aim for integrating AnimateDiff into AUTOMATIC1111 Stable Diffusion WebUI. I have tested this extension with WebUI v1.4.1 on Ubuntu 20.04 with NVIDIA 3090. You can generate GIFs in exactly the same way as generating images after enabling this extension.

This extension implements AnimateDiff in a different way. It does not require you to clone the whole SD1.5 repository. It also applied (probably) the least modification to ldm, so that you do not need to reload your model weights if you don't want to.

Batch size on WebUI will be replaced by GIF frame number internally: 1 full GIF generated in 1 batch. If you want to generate multiple GIF at once, please change batch number.

Batch number is NOT the same as batch size. In A1111 WebUI, batch number is above batch size. Batch number means the number of sequential steps, but batch size means the number of parallel steps. You do not have to worry too much when you increase batch number, but you do need to worry about your VRAM when you increase your batch size (where in this extension, video frame number). You do not need to change batch size at all when you are using this extension.

You might also be interested in another extension I created: Segment Anything for Stable Diffusion WebUI.

How to Use

  1. Install this extension via link.
  2. Download motion modules from Google Drive | HuggingFace | CivitAI | Baidu NetDisk. You only need to download one of mm_sd_v14.ckpt | mm_sd_v15.ckpt. Put the model weights under sd-webui-animatediff/model/. DO NOT change model filename.
  3. Go to txt2img if you want to try txt2gif and img2img if you want to try img2gif.
  4. Choose an SD1.5 checkpoint, write prompts, set configurations such as image width/height. If you want to generate multiple GIFs at once, please change batch number, instead of batch size.
  5. Enable AnimateDiff extension, set up each parameter (loop number means how many loop the GIF will be displayed, not the actual length of the GIF) and click Generate.
  6. You should see the output GIF on the output gallery. You can access GIF output at stable-diffusion-webui/outputs/{txt2img or img2img}-images/AnimateDiff. You can also access image frames at stable-diffusion-webui/outputs/{txt2img or img2img}-images/{date}.

Update

  • 2023/07/20 v1.1.0: fix gif duration, add loop number, remove auto-download, remove xformers, remove instructions on gradio UI, refactor README, add sponsor QR code.
  • 2023/07/24 v1.2.0: fix incorrect insertion of motion modules, add option to change path to save motion modules in Settings/AnimateDiff, fix loading different motion modules.
  • 2023/07/27 v1.2.1: add hash calculation of motion modules (you can disable it in Settings/AnimateDiff)

TODO

This TODO list will most likely be resolved sequentially.

  • greyer sample
  • other attention optimization (e.g. sdp)
  • img2img
  • token
  • shape
  • reddit

FAQ

  1. Q: I am using a remote server which blocks Google. What should I do?

    A: You will have to find a way to download motion modules locally and re-upload to your server.

  2. Q: How much VRAM do I need?

    A: Currently, you can run WebUI with this extension via NVIDIA 3090. I cannot guarantee any other variations of GPU. Actual VRAM usage depends on your image size and video frame number. You can try to reduce image size or video frame number to reduce VRAM usage. The default setting (displayed in Samples/txt2img section) consumes 12GB VRAM. More VRAM info will be added later.

  3. Q: Can I generate a video instead a GIF?

    A: Unfortunately, you cannot. This is because a whole batch of images will pass through a transformer module, which prevents us from generating videos sequentially. We look forward to future developments of deep learning for video generation.

  4. Q: Can I use SDXL to generate GIFs?

    A: At least at this time, you cannot. This extension essentially inject multiple motion modules into SD1.5 UNet. It does not work for other variations of SD, such as SD2.1 and SDXL. I'm not sure what will happen if you force-add motion modules to SD2.1 or SDXL. Future experiments are needed.

  5. Q: Can I use this extension to do gif2gif?

    A: Due to the 1-batch behavior of AnimateDiff, it is probably not possible to support gif2gif. However, I need to discuss this with the authors of AnimateDiff.

  6. Q: Can I use xformers?

    A: Yes, but it will not be applied to AnimateDiff due to a weird bug. I will try other optimizations. Note that xformers will change the GIF you generate.

  7. Q: This extension perform worse than AnimateDiff. There seem to be no motion but only glitches. Why?

    A: Because I inserted motion modules to the wrong place inside UNet output blocks. It is a very idiot typo (I wrote a 2 where it was supposed to be 3) but took me days to discover.

  8. Q: How can I reproduce the result in Samples/txt2img section?

    A: You must replace create_random_tensors with

        torch.manual_seed(<seed>)
        from einops import rearrange
        x = rearrange(torch.randn((4, 16, 64, 64), device=shared.device), 'c f h w -> f c h w')

    and retry. A1111 generate random tensors in a completely different way.

  9. Q: v1.2.0 does not work for img2img. Why?

    A: I don't know. I will try to figure out why very soon.

  10. Q: v1.2.0 seems to give a greyer sample compared to AnimateDiff. Why?

    A: I don't know. I will try to figure out why very soon.

Samples

txt2img

AnimateDiff A1111
image 00023-10788741199826055168

img2img

v1.2.0 does not work for img2img due to some unknown reason. Will be fixed later.

Sponsor

You can sponsor me via WeChat or Alipay.

WeChat Alipay
216aff0250c7fd2bb32eeb4f7aae623 15fe95b4ada738acf3e44c1d45a1805