An adapation of DiscoDiffusion (https://colab.research.google.com/drive/1sHfRn5Y0YKYKi1k-ifUSBFRNJ8_1sa39#scrollTo=BGBzhk3dpcGO) to run locally, to improve code quality and to speed it up. So far the code was just cleaned up a bit and the lpips network initialization was removed when only an input text is used.
Around 11GB GPU VRAM are needed for the current default settings of --width
1280 and --height
768. Decreasing the image size is the easiest way to make it fit in smaler GPUs.
With defaults settings it takes 07:46 minutes on an RTX 2080TI, 19:01 minutes on a GTX 1080 TI, and 17:01 minutes on a Titan XP to generate images like these:
The meaning of life by Picasso
The meaning of life by Greg Rutkowski
forgot the prompt but it was about pikachu staring at a tumultous sea of blood, adapted from the DiscoDiffusion original notebook
If you're using Windows, please also refer to the section below called Setup for Windows
!
First run ipython3 diffuse.py
to set everything up and to clone the repositories. IMPORTANT: you need to use ipython instead of python because I was lazy and all git clone etc are run via ipython
At the moment you can only set a single text as a target but this should be improved in the future. Only runs with GPU support atm.
Use it like this:
python3 diffuse.py --text "The meaning of life --gpu [Optional: device number of GPU to run this on] --root_path [Optional: path to output folder, default is "out_diffusion" in local dir]
If you only have 8 GB VRAM on your GPU, the highest resolution you can use run is 832x512, or 896x448. Set it by adding --width 832 --height 512
for example. Thanks @Jotunblood for testing!
you can also set: --out_name [Optional: set naming in your root_path according to this for better overview]
and --sharpen_preset [Optional: set it to any of ('Off', 'Faster', 'Fast', 'Slow', 'Very Slow') to modify the sharpening process at the end. Default: Off]
See NotNANtoN#1, the instructions from @JotunBlood are adopted here.
Instructions:
- Install Anaconda
- Create and activate a new environment (don't use base)
- Install pytorch via their web code, using pip (not conda)
- Install iPython
- Add the forge channel to anaconda
- conda config --add channels conda-forge
- Install dependency packages using conda (for those available), otherwise use pip. Packages of relevance: OpenCV, pandas, timm, lpips, requests, pytorch-lightning, and omegaconf. There might be one or two others.
- Run ipython diffuse.py
- If it goes all the way, congrats. If you hit the SSL errors, open diffuse.py and add the following lines to the top of diffuse.py to the top (I did it around line 7.):
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
- If you get Frame Prompt: [''] and a failed output, make sure you're using python3 to run diffuse.py and not iPython :)
- If you get a CUDA out of memory warning, pass a lower res like --width 720 --height 480 when you run
This section is outdated as of v2
Setting | Description | Default |
---|---|---|
Your vision: | ||
text_prompts |
A description of what you'd like the machine to generate. Think of it like writing the caption below your image on a website. | N/A |
image_prompts |
Think of these images more as a description of their contents. | N/A |
Image quality: | ||
clip_guidance_scale |
Controls how much the image should look like the prompt. | 1000 |
tv_scale |
Controls the smoothness of the final output. | 150 |
range_scale |
Controls how far out of range RGB values are allowed to be. | 150 |
sat_scale |
Controls how much saturation is allowed. From nshepperd's JAX notebook. | 0 |
cutn |
Controls how many crops to take from the image. | 16 |
cutn_batches |
Accumulate CLIP gradient from multiple batches of cuts | 2 |
Init settings: | ||
init_image |
URL or local path | None |
init_scale |
This enhances the effect of the init image, a good value is 1000 | 0 |
`skip_steps Controls the starting point along the diffusion timesteps | 0 | |
perlin_init |
Option to start with random perlin noise | False |
perlin_mode |
('gray', 'color') | 'mixed' |
Advanced: | ||
skip_augs |
Controls whether to skip torchvision augmentations | False |
randomize_class |
Controls whether the imagenet class is randomly changed each iteration | True |
clip_denoised |
Determines whether CLIP discriminates a noisy or denoised image | False |
clamp_grad |
Experimental: Using adaptive clip grad in the cond_fn | True |
seed |
Choose a random seed and print it at end of run for reproduction | random_seed |
fuzzy_prompt |
Controls whether to add multiple noisy prompts to the prompt losses | False |
rand_mag |
Controls the magnitude of the random noise | 0.1 |
eta |
DDIM hyperparameter | 0.5 |