DiffSynth Studio is a Diffusion engine. We have restructured architectures including Text Encoder, UNet, VAE, among others, maintaining compatibility with models from the open-source community while enhancing computational performance. We provide many interesting features. Enjoy the magic of Diffusion models!
Until now, DiffSynth Studio has supported the following models:
- CogVideoX
- FLUX
- ExVideo
- Kolors
- Stable Diffusion 3
- Stable Video Diffusion
- Hunyuan-DiT
- RIFE
- ESRGAN
- Ip-Adapter
- AnimateDiff
- ControlNet
- Stable Diffusion XL
- Stable Diffusion
-
October 8, 2024. We release the extended LoRA based on CogVideoX-5B and ExVideo. You can download this model from ModelScope or HuggingFace.
-
August 22, 2024. CogVideoX-5B is supported in this project. See here. We provide several interesting features for this text-to-video model, including
- Text to video
- Video editing
- Self-upscaling
- Video interpolation
-
August 22, 2024. We have implemented an interesting painter that supports all text-to-image models. Now you can create stunning images using the painter, with assistance from AI!
- Use it in our WebUI.
-
August 21, 2024. FLUX is supported in DiffSynth-Studio.
- Enable CFG and highres-fix to improve visual quality. See here
- LoRA, ControlNet, and additional models will be available soon.
-
June 21, 2024. 🔥🔥🔥 We propose ExVideo, a post-tuning technique aimed at enhancing the capability of video generation models. We have extended Stable Video Diffusion to achieve the generation of long videos up to 128 frames.
- Project Page
- Source code is released in this repo. See
examples/ExVideo
. - Models are released on HuggingFace and ModelScope.
- Technical report is released on arXiv.
- You can try ExVideo in this Demo!
-
June 13, 2024. DiffSynth Studio is transferred to ModelScope. The developers have transitioned from "I" to "we". Of course, I will still participate in development and maintenance.
-
Jan 29, 2024. We propose Diffutoon, a fantastic solution for toon shading.
- Project Page
- The source codes are released in this project.
- The technical report (IJCAI 2024) is released on arXiv.
-
Dec 8, 2023. We decide to develop a new Project, aiming to release the potential of diffusion models, especially in video synthesis. The development of this project is started.
-
Nov 15, 2023. We propose FastBlend, a powerful video deflickering algorithm.
-
Oct 1, 2023. We release an early version of this project, namely FastSDXL. A try for building a diffusion engine.
- The source codes are released on GitHub.
- FastSDXL includes a trainable OLSS scheduler for efficiency improvement.
-
Aug 29, 2023. We propose DiffSynth, a video synthesis framework.
- Project Page.
- The source codes are released in EasyNLP.
- The technical report (ECML PKDD 2024) is released on arXiv.
Install from source code (recommended):
git clone https://github.com/modelscope/DiffSynth-Studio.git
cd DiffSynth-Studio
pip install -e .
Or install from pypi:
pip install diffsynth
The Python examples are in examples
. We provide an overview here.
Download the pre-set models. Model IDs can be found in config file.
from diffsynth import download_models
download_models(["FLUX.1-dev", "Kolors"])
Download your own models.
from diffsynth.models.downloader import download_from_huggingface, download_from_modelscope
# From Modelscope (recommended)
download_from_modelscope("Kwai-Kolors/Kolors", "vae/diffusion_pytorch_model.fp16.bin", "models/kolors/Kolors/vae")
# From Huggingface
download_from_huggingface("Kwai-Kolors/Kolors", "vae/diffusion_pytorch_model.fp16.safetensors", "models/kolors/Kolors/vae")
CogVideoX-5B is released by ZhiPu. We provide an improved pipeline, supporting text-to-video, video editing, self-upscaling and video interpolation. examples/video_synthesis
The video on the left is generated using the original text-to-video pipeline, while the video on the right is the result after editing and frame interpolation.
cogvideo.mp4
We trained extended video synthesis models, which can generate 128 frames. examples/ExVideo
github_title.mp4
demo.mp4
Render realistic videos in a flatten style and enable video editing features. examples/Diffutoon
Diffutoon.mp4
Diffutoon_edit.mp4
Video stylization without video models. examples/diffsynth
winter_stone.mp4
Generate high-resolution images, by breaking the limitation of diffusion models! examples/image_synthesis
.
LoRA fine-tuning is supported in examples/train
.
FLUX | Stable Diffusion 3 |
---|---|
Kolors | Hunyuan-DiT |
---|---|
Stable Diffusion | Stable Diffusion XL |
---|---|
Create stunning images using the painter, with assistance from AI!
video.mp4
This video is not rendered in real-time.
Before launching the WebUI, please download models to the folder ./models
. See here.
Gradio
version
pip install gradio
python apps/gradio/DiffSynth_Studio.py
Streamlit
version
pip install streamlit streamlit-drawable-canvas
python -m streamlit run apps/streamlit/DiffSynth_Studio.py