These are my scripts for using Stable Diffusion.
-
Install
fzfby executing:brew install fzf
-
Clone
kohya_ss(for training scripts):git clone git@github.com:bmaltais/kohya_ss.git
-
Clone this repository:
git clone git@github.com:Danand/stable-diffusion-scripts.git
-
Add this repository to the
PATHvariable:cd stable-diffusion-scripts && \ echo "export PATH=\"\${PATH}:$(pwd)\"" >> ~/.bashrc
-
Change mode of scripts to executable:
cd stable-diffusion-scripts && \ chmod +x *.sh
These scripts rely on reusable model folders linked to Stable Diffusion UI wrappers via symlinks:
sd-models-init.shCreates the valid folder layout for further symlinks.
sd-models-link.shLinks models from the current working directory (CWD) to Stable Diffusion UI wrappers.
sd-models-link-trained-lora.shLinks trained LoRA from CWD to Stable Diffusion UI wrappers.
sd-train-init.shCreates a valid folder layout for further training with kohya_ss scripts.
sd-train-lora-fzf.shRuns an interactive launch of LoRA training via kohya_ss scripts. The steps involved are:
- Set
KOHYA_SS_PATHandSD_MODELS_PATHif they differ from the parent directoryHOME. - Choose the base model (from
SD_MODELS_PATH). - Choose the training scripts (from
KOHYA_SS_PATH). - Enter the number of training epochs (default:
3). - Enter the training width (default:
512). - Enter the training height (default:
512). - Enter the training seed (default:
12345). - Enter the network dimension (default:
128). - Enter the network alpha (default:
128). - Enter the learning rate (default:
0.0001). - Enter the U-Net learning rate (default:
0.0001). - Enter the text encoder learning rate (default:
5e-5). - Enter the noise offset (default:
0.0).
sd-train-subsets-weights-edit.pySets the number of repeats per each subdirectory of images based on an interactively entered "weight".
- "Step" is the actual iteration of training.
- "Repeats" is the number used to set up the weight of each training subject subset of images.
- "Epochs" is the multiplier of total steps. The trained LoRA can be saved as a ready-to-use model on each epoch.
Take a look at the training images folder:
images/
1_SubjectName/ # Let's assume that there are 15 images in this folder.
3_SubjectName/ # Let's assume that there are 5 images in this folder.Let's assume that images from the folder 1_SubjectName are equally important for defining SubjectName (e.g., anime character or specific style) as those from 3_SubjectName. However, there are 15 images in 1_SubjectName, making 1_SubjectName 3 times heavier than 3_SubjectName in terms of weight. The magic numbers in folder name prefixes ("repeats") - 1_ and 3_ - are used as multipliers for the number of images in each folder. So, the total steps per each folder are:
1_SubjectName:15 images * 1 repeats # 15 steps3_SubjectName:5 images * 3 repeats # 15 steps too
Now, each folder is equally important for training the subject SubjectName, and the total steps for the subject are 30.
"Epochs" is just a multiplier for the total number of steps. For instance, if we want to train a model with a total of 3000 steps, we set epochs to 100 here.
The dataset (or training images) should:
- be of the same size
- have the same size as specified on the launch of training
- be placed in subset folder(s) named
N_SubjectName, where:Nis the number of repeats for this folderSubjectNameis a unique tag for triggering this subject with a prompt
- be provided with caption files (same name as the image name but with a
.txtextension)
Same size of images – not necessarily. kohya_ss scripts support "buckets": cropping input images in tiles automatically while learning; but it's slower.
You can use the Dataset Tag Editor (for A1111) for interrogating captions.
To be honest, I don't know. Some articles suggest a magic number of 2500 steps for training LoRA.
Choose the base model that is closest in style to your LoRA.
train_network.py– trains LoRA with SD 1.5-based modelsdxl_train_network.py– trains LoRA with SDXL-based model
You can find Base Model version on the page of the preferred base model at CivitAI.
Resolution of training images. They must be resized and cropped to those values.
It's random. Choose any. But you can enter the same seed each time for clearer comparison of results of training.
The higher value, the more "cooked" (taking more effect) LoRA. You could start with 2 and increase by the power of 2:
24816- ...
128- ...
Try to set it as equal to "network dim" or less.
Start with 0.0001 and then decrease the value by 2 times.
0.00010.000050.000025- ...
Try to set it as equal to "learning rate" or less.
Start with 5e-5 and then decrease the value.
It's 5 * (10 ** -5) or 0.00005.
You can check this by running formatting of the value with Python:
$ python3 -c 'print(format(5e-5, "f"))'
0.000050Not necessarily. But it's often used in tutorials for learning rate values. I don't know why.
Default is 0.0, but some tutorials suggest 0.1 for more contrast in images generated with trained LoRA. However, I didn't notice so. Generated images were even paler with noise offset 0.1 than with noise offset 0.0.