This repo contains the logic to do inferencing for the popular Stable Diffusion deep learning model in C#. Stable Diffusion models take a text prompt and create an image that represents the text. See the example below:
For the below example sentence the CLIP model creates a text embedding that connects text to image. A random noise image is created and then denoised with the unet
model and scheduler algorithm to create an image that represents the text prompt. Lastly the decoder model vae_decoder
is used to create a final image that is the result of the text prompt and the latent image.
"make a picture of green tree with flowers around it and a red sky"
Auto Generated Random Latent Seed Input | Resulting image output |
---|---|
-
A GPU enabled machine with CUDA EP Configured. This was built on a GTX 3070 and it has not been tested on anything smaller. Follow this tutorial to configure CUDA and cuDNN for GPU with ONNX Runtime and C# on Windows 11
Download the ONNX Stable Diffusion models from Hugging Face.
Once you have selected a model version repo, click Files and Versions
, then select the ONNX
branch. If there isn't an ONNX model branch available, use the main
branch and convert it to ONNX. See the ONNX conversion tutorial for PyTorch for more information.
- Clone the repo:
git lfs install
git clone https://huggingface.co/CompVis/stable-diffusion-v1-4 -b onnx
- Copy the folders with the ONNX files to the C# project folder
\StableDiffusion\StableDiffusion
. The folders to copy are:unet
,vae_decoder
,text_encoder
,safety_checker
.