/stable-diffusion.cpp

Stable Diffusion in pure C/C++

Primary LanguageC++MIT LicenseMIT

stable-diffusion.cpp

Inference of Stable Diffusion in pure C/C++

Features

  • Plain C/C++ implementation based on ggml, working in the same way as llama.cpp
  • 16-bit, 32-bit float support
  • 4-bit, 5-bit and 8-bit integer quantization support
  • Accelerated memory-efficient CPU inference
  • AVX, AVX2 and AVX512 support for x86 architectures
  • Original txt2img mode
  • Negative prompt
  • Sampling method
    • Euler A
  • Supported platforms
    • Linux
    • Mac OS
    • Windows

TODO

  • Original img2img mode
  • More sampling methods
  • GPU support
  • Make inference faster
    • The current implementation of ggml_conv_2d is slow and has high memory usage
  • Continuing to reduce memory usage (quantizing the weights of ggml_conv_2d)
  • stable-diffusion-webui style tokenizer (eg: token weighting, ...)
  • LoRA support
  • k-quants support

Usage

Get the Code

git clone --recursive https://github.com/leejet/stable-diffusion.cpp
cd stable-diffusion.cpp

Convert weights

  • download original weights(.ckpt or .safetensors). For example

    curl -L -O https://huggingface.co/CompVis/stable-diffusion-v-1-4-original/resolve/main/sd-v1-4.ckpt
    # curl -L -O https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.safetensors
  • convert weights to ggml model format

    cd models
    pip install -r requirements.txt
    python convert.py [path to weights] --out_type [output precision]
    # For example, python convert.py sd-v1-4.ckpt --out_type f16

Quantization

You can specify the output model format using the --out_type parameter

  • f16 for 16-bit floating-point
  • f32 for 32-bit floating-point
  • q8_0 for 8-bit integer quantization
  • q5_0 or q5_1 for 5-bit integer quantization
  • q4_0 or q4_1 for 4-bit integer quantization

Build

mkdir build
cd build
cmake ..
cmake --build . --config Release

Using OpenBLAS

cmake .. -DGGML_OPENBLAS=ON
cmake --build . --config Release

Run

usage: ./sd [arguments]

arguments:
  -h, --help                         show this help message and exit
  -t, --threads N                    number of threads to use during computation (default: -1).
                                     If threads <= 0, then threads will be set to the number of CPU cores
  -m, --model [MODEL]                path to model
  -o, --output OUTPUT                path to write result image to (default: .\output.png)
  -p, --prompt [PROMPT]              the prompt to render
  -n, --negative-prompt PROMPT       the negative prompt (default: "")
  --cfg-scale SCALE                  unconditional guidance scale: (default: 7.0)
  -H, --height H                     image height, in pixel space (default: 512)
  -W, --width W                      image width, in pixel space (default: 512)
  --sample-method SAMPLE_METHOD      sample method (default: "eular a")
  --steps  STEPS                     number of sample steps (default: 20)
  -s SEED, --seed SEED               RNG seed (default: 42, use random seed for < 0)
  -v, --verbose                      print extra info

For example

./sd -m ../models/sd-v1-4-ggml-model-f16.bin -p "a lovely cat"

Using formats of different precisions will yield results of varying quality.

f32 f16 q8_0 q5_0 q5_1 q4_0 q4_1

Memory/Disk Requirements

precision f32 f16 q8_0 q5_0 q5_1 q4_0 q4_1
Disk 2.8G 2.0G 1.7G 1.6G 1.6G 1.5G 1.5G
Memory(txt2img - 512 x 512) ~4.9G ~4.1G ~3.8G ~3.7G ~3.7G ~3.6G ~3.6G

References