AnimateDiff for Stable Diffusion WebUI

This extension aim for integrating AnimateDiff into AUTOMATIC1111 Stable Diffusion WebUI. I have tested this extension with WebUI v1.4.1 on Ubuntu 20.04 with NVIDIA 3090. You can generate GIFs in exactly the same way as generating images after enabling this extension.

This extension implements AnimateDiff in a different way. It does not require you to clone the whole SD1.5 repository. It also applied (probably) the least modification to ldm, so that you do not need to reload your model weights if you don't want to.

Batch size on WebUI will be replaced by GIF frame number internally: 1 full GIF generated in 1 batch. If you want to generate multiple GIF at once, please change batch number.

Batch number is NOT the same as batch size. In A1111 WebUI, batch number is above batch size. Batch number means the number of sequential steps, but batch size means the number of parallel steps. You do not have to worry too much when you increase batch number, but you do need to worry about your VRAM when you increase your batch size (where in this extension, video frame number). You do not need to change batch size at all when you are using this extension.

You might also be interested in another extension I created: Segment Anything for Stable Diffusion WebUI.

How to Use

Install this extension via link.
Download motion modules and put the model weights under sd-webui-animatediff/model/. If you want to use another directory to save the model weights, please go to Settings/AnimateDiff. See model zoo for a list of available motion modules.

WebUI

Go to txt2img if you want to try txt2gif and img2img if you want to try img2gif.
Choose an SD1.5 checkpoint, write prompts, set configurations such as image width/height. If you want to generate multiple GIFs at once, please change batch number, instead of batch size.
Enable AnimateDiff extension, and set up each parameter, and click Generate.
1. Number of frames — The model is trained with 16 frames, so it’ll give the best results when the number of frames is set to 16.
2. Frames per second — How many frames (images) are shown every second. If 16 frames are generated at 8 frames per second, your GIF’s duration is 2 seconds.
3. Loop number — How many times the GIF is played. A value of 0 means the GIF never stops playing.
You should see the output GIF on the output gallery. You can access GIF output at stable-diffusion-webui/outputs/{txt2img or img2img}-images/AnimateDiff. You can also access image frames at stable-diffusion-webui/outputs/{txt2img or img2img}-images/{date}.

API

#42

Motion Module Model Zoo

mm_sd_v14.ckpt & mm_sd_v15.ckpt by @guoyww: Google Drive | HuggingFace | CivitAI | Baidu NetDisk
mm-Stabilized_high.pth & mm-Stabbilized_mid.pth by @manshoety: HuggingFace

Update

2023/07/20 v1.1.0: fix gif duration, add loop number, remove auto-download, remove xformers, remove instructions on gradio UI, refactor README, add sponsor QR code.
2023/07/24 v1.2.0: fix incorrect insertion of motion modules, add option to change path to save motion modules in Settings/AnimateDiff, fix loading different motion modules.
2023/09/04 v1.3.0: support any community models with the same architecture; fix grey problem via #63 (credit to @TDS4874 and @opparco)

TODO

This TODO list will most likely be resolved sequentially.

FAQ

Q: I am using a remote server which blocks Google. What should I do?

A: You will have to find a way to download motion modules locally and re-upload to your server.
Q: How much VRAM do I need?

A: Currently, you can run WebUI with this extension via NVIDIA 3090. I cannot guarantee any other variations of GPU. Actual VRAM usage depends on your image size and video frame number. You can try to reduce image size or video frame number to reduce VRAM usage. The default setting (displayed in Samples/txt2img section) consumes 12GB VRAM. More VRAM info will be added later.
Q: Can I generate a video instead a GIF?

A: Unfortunately, you cannot. This is because a whole batch of images will pass through a transformer module, which prevents us from generating videos sequentially. We look forward to future developments of deep learning for video generation.
Q: Can I use SDXL to generate GIFs?

A: At least at this time, you cannot. This extension essentially inject multiple motion modules into SD1.5 UNet. It does not work for other variations of SD, such as SD2.1 and SDXL. I'm not sure what will happen if you force-add motion modules to SD2.1 or SDXL. Future experiments are needed.
Q: Can I use this extension to do gif2gif?

A: Due to the 1-batch behavior of AnimateDiff, it is probably not possible to support gif2gif. However, I need to discuss this with the authors of AnimateDiff.
Q: Can I use xformers?

A: Yes, but it will not be applied to AnimateDiff due to a weird bug. I will try other optimizations. Note that xformers will change the GIF you generate.
Q: How can I reproduce the result in Samples/txt2img section?

A: You must replace create_random_tensors with
```
    torch.manual_seed(<seed>)
    from einops import rearrange
    x = rearrange(torch.randn((4, 16, 64, 64), device=shared.device), 'c f h w -> f c h w')
```
and retry. A1111 generate random tensors in a completely different way. This only works for WebUI < v1.6.0. This portion of instruction will be updated after I look into the source code of the new random tensor generation logic.
Q: v1.2.0 does not work for img2img. Why?

A: I don't know. I will try to figure out why very soon.