data\
: Contains datasetslogs\
: Contains all logs for experimentsrequirements.txt
: PIP requirements for this projectlatent_dataset_creator.ipynb
: Notebook for creating latent datasetslatent_selfie2anime.zip
: Premade latent dataset for selfie2animegaussian_diffusion.py
: Diffusion algorithms from OpenAImdtv2.py
: Official MDTv2 architecturelatent_dataset.py
: Custom pytorch dataset object for latent datasetsimg_train.py
: Script for training a modelimg_sample.py
: Script for generating images from a trained modelimg_translate.py
: Script for doing image to image translation from a trained model
- Find a dataset to you want to transform
- Run the latent dataset creator with your dataset
Since the selfie2anime dataset is small enough, we provide the latent dataset for selfie2anime
. So there is no need to create a latent dataset to test this repo. Simply unzip latent_self2anime.zip
.
By default our repo will train the smallest model, however it is still quite large, so make sure you have a good enough GPU
Example:
python img_train.py --data-dir="data\latent_selfie2anime"
The most recent model will be saved in the logs directly under the directly with the start date. Images generated from the same noise will also be output every epoch, so you can observe the process!
A pretrained model is to large to include in the project so you need to train your own! On a decent GPU the smaller model trained on selfie2anime should not take too long to train, perhaps a few hours before you see acceptable results, and longer training gives better and better results.
Example:
python img_sample.py --model-path="data\model.pt" --num-samples=8 --sample-steps=250
sample-steps: Sample steps are the amount of steps used for the generation, they go from 0 to 1000. Higher leads to better quality but more processing time.
Results will be output in the logs directory.
Example:
python img_translate.py --model-path="model.pt" --src-img-path="my_img.png" --strength=0.4 --cond=0
strength: Strength is the amount of noise to add to the image from 0-1 where higher means more noise. cond: Is the class/domain identifier, in the case of selfie2anime 0=anime, 1=human
To do image to image translate within the same domain, meaning your making variations of the same image, simply use the same domain as the cond. If your doing translation between domains, use the opposite cond.