/voltaML-fast-stable-diffusion

Lightweight library to accelerate Stable-Diffusion, Dreambooth into fastest inference models with single line of code 🔥 🔥

Primary LanguagePythonApache License 2.0Apache-2.0

🔥 🔥 voltaML-fast-stable-diffusion webUI 🔥 🔥

Accelerate your machine learning and deep learning models by upto 10X

Lightweight library to accelerate Stable-Diffusion, Dreambooth into fastest inference models with WebUI single click or single line of code.

Setup webUI

Screenshot from 2022-12-12 11-19-09

Screenshot from 2022-12-12 11-36-37

Docker setup (if required)

Setup docker on Ubuntu using these intructions.

Setup docker on Windows using these intructions

Folder setup

Please create two folders one called "engine" and one called "output" in your local computer.

C:\voltaml\engine 
C:\voltaml\output

Launch voltaML container

sudo docker run --gpus=all -v "path-to-engine-folder":/workspace/voltaML-fast-stable-diffusion/engine -v "path-to-output-folder":/workspace/voltaML-fast-stable-diffusion/static/output -p 5003:5003 -it voltaml/volta_diffusion_webui:v0.2

⚠️ You need to mount a local volume to save your work onto your system. Or else the work will be deleted once you exit the container
⚠️ To save your work in the container itself, you have to commit the container and then exit the container.

How to use webUI

  1. Once you launch the container, a flask app will run and copy/paste the url to run the webUI on your local host. Screenshot from 2022-12-12 12-36-01

  2. There are two backends to run the SD on, PyTorch and TensorRT (fastest version)

  3. To run on PyTorch inference, you have to select the model, the model will be downloaded (which will take a few mins) into the container and the inference will be displayed. Downloaded models will be shown as below download_sd

  4. To run TensoRT inference, go to the Accelerate tab, pick a model from our model hub and click on the accelerate button.
    Screenshot from 2022-12-12 13-17-23

  5. Once acceleration is done, the model will show up in your TensorRT drop down menu.

  6. Switch your backend to TensorRT, select the model and enjoy the fastest outputs 🚀🚀

Benchmark

The below benchmarks have been done for generating a 512x512 image, batch size 1 for 50 iterations.

Model T4 (it/s) A10 (it/s) A100 (it/s) 4090 (it/s) 3090 (it/s) 2080Ti (it/s)
PyTorch 4.3 8.8 15.1 19 11 8
Flash attention xformers 5.5 15.6 27.5 28 15.7 N/A
AITemplate Not supported 26.7 55 60 N/A Not supported
VoltaML(TRT-Flash) 11.4 29.2 62.8 85 44.7 26.2

⚠️ ‼️ Warnings/Caveats

This is v0.1 of the product. Things might break. A lot of improvements are on the way, so please bear with us.

  1. This will only work for NVIDIA GPUs with compute capability > 7.5
  2. Cards with less than 12GB VRAM will have issues with acceleration, due to high memory required for the conversions. We're working on resolving these in our next release.
  3. While the model is accelerating, no other functionality will work since the GPU will be fully occupied