⏬Download Models | 💻How to Test | 💥
T2I adapters naturally support using multiple adapters together.
The running command is here
Image source
🚩 New Features/Updates
- ✅ Feb. 23, 2023. Add the depth adapter t2iadapter_depth_sd14v1.pth. See more info in the Adapter Zoo.
Official implementation of T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models.
We propose T2I-Adapter, a simple and small (~70M parameters, ~300M storage space) network that can provide extra guidance to pre-trained text-to-image models while freezing the original large text-to-image models.
T2I-Adapter aligns internal knowledge in T2I models with external control signals. We can train various adapters according to different conditions, and achieve rich control and editing effects.
Put the downloaded models in the T2I-Adapter/models
folder.
- The T2I-Adapters can be download from https://huggingface.co/TencentARC/T2I-Adapter.
- The pretrained Stable Diffusion v1.4 models can be download from https://huggingface.co/CompVis/stable-diffusion-v-1-4-original/tree/main. You need to download the
sd-v1-4.ckpt
file. - [Optional] If you want to use Anything v4.0 models, you can download the pretrained models from https://huggingface.co/andite/anything-v4.0/tree/main. You need to download the
anything-v4.0-pruned.ckpt
file. - The pretrained clip-vit-large-patch14 folder can be download from https://huggingface.co/openai/clip-vit-large-patch14/tree/main. Remember to download the whole folder!
- The pretrained keypose detection models include FasterRCNN (human detection) from https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_1x_coco/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth and HRNet (pose detection) from https://download.openmmlab.com/mmpose/top_down/hrnet/hrnet_w48_coco_256x192-b9e0b3ab_20200708.pth.
After downloading, the folder structure should be like this:
- Python >= 3.6 (Recommend to use Anaconda or Miniconda)
- PyTorch >= 1.4
pip install -r requirements.txt
- If you want to use the full function of keypose-guided generation, you need to install MMPose. For details please refer to https://github.com/open-mmlab/mmpose.
python test_depth.py --prompt "Stormtrooper's lecture, best quality, extremely detailed" --path_cond examples/depth/sd.png --ckpt models/v1-5-pruned-emaonly.ckpt --type_in image --sampler ddim --scale 9 --cond_weight 1.5
- Sketch to Image Generation
python test_sketch.py --prompt "A car with flying wings" --path_cond examples/sketch/car.png --ckpt models/sd-v1-4.ckpt --type_in sketch
- Image to Sketch to Image Generation
python test_sketch.py --prompt "A beautiful girl" --path_cond examples/sketch/human.png --ckpt models/sd-v1-4.ckpt --type_in image
- The adaptor is training based on stable-diffusion-v1.4 but can be generalized to other models, such as Anything-v4 which is an anime diffusion model
python test_sketch.py --prompt "1girl, masterpiece, high-quality, high-res" --path_cond examples/anything_sketch/human.png --ckpt models/anything-v4.0-pruned.ckpt --ckpt_vae models/anything-v4.0.vae.pt --type_in image
- Keypose to Image Generation
python test_keypose.py --prompt "A beautiful girl" --path_cond examples/keypose/iron.png --type_in pose
- Image to Image Generation
python test_keypose.py --prompt "A beautiful girl" --path_cond examples/sketch/human.png --type_in image
- Generation anime image with Anything-v4 model
python test_keypose.py --prompt "A beautiful girl" --path_cond examples/sketch/human.png --ckpt models/anything-v4.0-pruned.ckpt --ckpt_vae models/anything-v4.0.vae.pt --type_in image
python test_seg.py --prompt "A black Honda motorcycle parked in front of a garage" --path_cond examples/seg/motor.png
python test_composable_adapters.py --prompt "An all white kitchen with an electric stovetop" --seg_cond_path examples/seg_sketch/mask.png --sketch_cond_path examples/seg_sketch/edge.png --sketch_cond_weight 0.5
python test_sketch_edit.py --prompt "A white cat" --path_cond examples/edit_cat/edge_2.png --path_x0 examples/edit_cat/im.png --path_mask examples/edit_cat/mask.png
The following is the detailed structure of a Stable Diffusion model with the T2I-Adapter.
The corresponding edge maps are predicted by PiDiNet. The sketch T2I-Adapter can well generalize to other similar sketch types, for example, sketches from the Internet and user scribbles.
The keypose results predicted by the MMPose. With the keypose guidance, the keypose T2I-Adapter can also help to generate animals with the same keypose, for example, pandas and tigers.
Once the T2I-Adapter is trained, it can act as a plug-and-play module and can be seamlessly integrated into the finetuned diffusion models without re-training, for example, Anything-4.0.
When combined with the inpaiting mode of Stable Diffusion, we can realize local editing with user specific guidance.
Adapter can be used to enhance the SD ability to combine different concepts.
We can realize the sequential editing with the adapter guidance.
Stable Diffusion results guided with the segmentation and sketch adapters together.
Thank haofanwang for providing a tutorial of T2I-Adapter diffusers.