From RiversHaveWings.
Generate vibrant and detailed images using only text.
See captions and more generations in the Gallery
See also - VQGAN-CLIP
❯ git clone https://github.com/afiaka87/clip-guided-diffusion.git && cd clip-guided-diffusion
❯ git clone https://github.com/afiaka87/guided-diffusion.git
❯ pip3 install -e guided-diffusion
❯ python3 setup.py install
❯ cgd -txt "puddle"
3%|██▉ | 28/1000 [00:07<04:08, 3.91it/s]
# Initialize diffusion generator
from cgd import clip_guided_diffusion
import cgd_util
import kornia.augmentation as K
prompt = "An image of a fox in a forest."
# Pass in your own augmentations (supports torchvision.transforms/kornia.augmentation)
# (defaults to no augmentations, which is likely best unless you're doing something special)
aug_list = [
K.RandomAffine(degrees=0, translate=(0.1, 0.1), scale=(0.9, 1.1), shear=0.1)),
K.RandomMotionBlur(kernel_size=(1, 5), angle=15, direction=0.5)),
K.RandomHorizontalFlip(p=0.5)),
]
# Remove non-alphanumeric and white space characters from prompt and prompt_min for directory name
outputs_path = cgd_util.txt_to_dir(base_path=prefix_path, txt=prompt)
outputs_path.mkdir(exist_ok=True)
# `cgd_samples` is a generator that yields the output images
cgd_samples = clip_guided_diffusion(prompt=prompt, prefix=outputs_path, augs=aug_list)
# Image paths will all be in `all_images` for e.g. video generation at the end.
all_images = []
for step, output_path in enumerate(cgd_samples):
if step % save_frequency == 0:
print(f"Saving image {step} to {output_path}")
all_images.append(output_path)
- Respective guided-diffusion checkpoints from OpenAI will be downloaded to
~/.cache/clip-guided-diffusion/
by default. - The file
current.png
can be refreshed to see the current image.
--prompt
/ -txt
--image_size
/ -size
- Filename format
f"{caption}/batch_idx_{j}_iteration_{i}.png"
- The most recent generation will also be stored in the file
current.png
❯ cgd -size 256 -txt "32K HUHD Mushroom"
Step 999, output 0:
00%|███████████████| 1000/1000 [00:00<12:30, 1.02it/s]
--class_score
/-score
- Scores are used to weight class selection.
❯ cgd -score -cgs 200 -cutn 64 -size 256 -respace 'ddim100' --prompt "cat painting"
--diffusion_steps
,-steps
(default:1000
)25
,50
,150
,250
,500
,1000
,- The default of
1000
is the most accurate and is recommended.
--timestep_respacing
or-respace
(default:1000
)- Use fewer timesteps over the same diffusion schedule.
- e.g.
-respace "ddim25"
- e.g.
options
:25
,50
,150
,250
,500
,1000
,ddim25
,ddim50
,ddim150
,ddim250
,ddim500
,ddim1000
- Use fewer timesteps over the same diffusion schedule.
❯ cgd -respace 'ddim50' -txt "cat painting"
- Smaller
-respace
values can benefit a lot from class scoring.
❯ cgd -score -respace 50 -txt "cat painting"
--prompt_min
/-min
- Loss for prompt_min is weighted 0.5, a value found in experimentation.
- Also used to weight class selection with
-score
.
❯ cgd -txt "32K HUHD Mushroom" -min "green grass"
--init_image
/-init
and--skip_timesteps
/-skip
Blend an image with the diffusion for a number of steps.--skip_timesteps
/-skip
is the number of timesteps to spend blending.- Needs to be set in order to blend an image.
- Good range for
-respace=1000
is 350 to 650.
❯ cgd -txt "A mushroom in the style of Vincent Van Gogh" \
-init "images/32K_HUHD_Mushroom.png" \
-skip 500
- Default is 128px
- Available image sizes are
64, 128, 256, 512 pixels (square)
- The 512x512 pixel checkpoint requires a GPU with at least 12GB of VRAM.
--clip_guidance_scale
and--tv_scale
will require experimentation.- the 64x64 diffusion checkpoint is challenging to work with and often results in an all-white or all-black image.
- This is much less of an issue when using an existing image of some sort.
❯ cgd \
--init_image=images/32K_HUHD_Mushroom.png \
--skip_timesteps=500 \
--image_size 64 \
--prompt "8K HUHD Mushroom"
❯ $ cgd --image_size 512 --prompt "8K HUHD Mushroom"
-h, --help show this help message and exit
--prompt PROMPT, -txt PROMPT
the prompt to reward (default: )
--prompt_min PROMPT_MIN, -min PROMPT_MIN
the prompt to penalize (default: None)
--min_weight MIN_WEIGHT, -min_wt MIN_WEIGHT
the prompt to penalize (default: 0.1)
--image_size IMAGE_SIZE, -size IMAGE_SIZE
Diffusion image size. Must be one of [64, 128, 256, 512]. (default: 128)
--init_image INIT_IMAGE, -init INIT_IMAGE
Blend an image with diffusion for n steps (default: None)
--skip_timesteps SKIP_TIMESTEPS, -skip SKIP_TIMESTEPS
Number of timesteps to blend image for. CLIP guidance occurs after this. (default: 0)
--prefix PREFIX, -dir PREFIX
output directory (default: outputs)
--checkpoints_dir CHECKPOINTS_DIR, -ckpts CHECKPOINTS_DIR
Path subdirectory containing checkpoints. (default: checkpoints)
--batch_size BATCH_SIZE, -bs BATCH_SIZE
the batch size (default: 1)
--clip_guidance_scale CLIP_GUIDANCE_SCALE, -cgs CLIP_GUIDANCE_SCALE
Scale for CLIP spherical distance loss. Values will need tinkering for different settings. (default: 1000)
--tv_scale TV_SCALE, -tvs TV_SCALE
Scale for denoising loss (default: 100)
--class_score, -score
Enables CLIP guided class randomization. (default: False)
--top_n TOP_N, -top TOP_N
Top n imagenet classes compared to phrase by CLIP (default: 1000)
--seed SEED, -seed SEED
Random number seed (default: 0)
--save_frequency SAVE_FREQUENCY, -freq SAVE_FREQUENCY
Save frequency (default: 1)
--diffusion_steps DIFFUSION_STEPS, -steps DIFFUSION_STEPS
Diffusion steps (default: 1000) --timestep_respacing TIMESTEP_RESPACING, -respace TIMESTEP_RESPACING
Timestep respacing (default: 1000)
--num_cutouts NUM_CUTOUTS, -cutn NUM_CUTOUTS
Number of randomly cut patches to distort from diffusion. (default: 32)
--cutout_power CUTOUT_POWER, -cutpow CUTOUT_POWER
Cutout size power (default: 0.5)
--clip_model CLIP_MODEL, -clip CLIP_MODEL
clip model name. Should be one of: ('ViT-B/16', 'ViT-B/32', 'RN50', 'RN101', 'RN50x4', 'RN50x16') (default: ViT-B/32)
--class_cond CLASS_COND, -cond CLASS_COND
Use class conditional. Required for image sizes other than 256 (default: True)
This code is currently under active development and is subject to frequent changes. Please file an issue if you have any constructive feedback, questions, or issues with the code or colab notebook.
git clone https://github.com/afiaka87/clip-guided-diffusion.git
cd clip-guided-diffusion
git clone https://github.com/afiaka87/guided-diffusion.git
python3 -m venv cgd_venv
source cgd_venv/bin/activate
pip install -r requirements.txt
pip install -e guided-diffusion
python -m unittest discover