Deep Floyd IF with Multiple GPU

Serve state-of-the-art stable diffusion model with multiple GPU with ease.
Powered by BentoML ๐Ÿฑ + StabilityAI ๐ŸŽจ + HuggingFace ๐Ÿค—

๐Ÿ“– Introduction ๐Ÿ“–

  • ๐Ÿงช Stable Diffusion: Stable Diffusion is a deep learning, text-to-image model primarily used to generate detailed images conditioned on text descriptions.

  • ๐Ÿ”ฎ IF by DeepFloyd Lab: IF is a novel state-of-the-art open-source text-to-image model with a high degree of photorealism and language understanding.

  • ๐Ÿš€ BentoML with IF and GPUs: In this project, BentoML demonstrate how to serve IF models easily across multiple GPU

  • ๐ŸŽ›๏ธ Interactive Experience with Gradio UI: You can play with the hosted IF model with an interactive Gradio UI.

๐Ÿƒโ€โ™‚๏ธ Running the Service ๐Ÿƒโ€โ™‚๏ธ

Prerequisite

To be able to run this project locally, you will need to have the following:

  • Python 3.8+
  • pip installed
  • At least 2x16GB VRAM GPU or 1x40 VRAM GPU

Installing Dependencies

It is recommended to use a Virtual Environment in your python projects for dependencies isolation. Run the following to install dependencies:

pip install -r requirements.txt

Import the IF Models

To download the IF Models to your local Bento Store:

python import_models.py

However, if you have never download models from huggingface via command line, you may need to authorize first using following commands:

pip install -U huggingface_hub
huggingface-cli login

Run the Web Server with Gradio

Run the server locally with web UI powered by gradio:

# For a GPU with more than 40GB VRAM, run all models on the same GPU
python start-server.py

# For two Tesla T4 with 15GB VRAM each, 
# assign stage1 model to the first GPU, 
# and stage2 and stage3 models to the second GPU
python start-server.py --stage1-gpu=0 --stage2-gpu=1 --stage3-gpu=1

# For one Tesla T4 with 15GB VRAM and two additional GPUs with smaller VRAM size, 
# assign stage1 model to T4, 
# and stage2 and stage3 models to the second and third GPUs respectively
python start-server.py --stage1-gpu=0 --stage2-gpu=1 --stage3-gpu=2

Then you can visit the web UI at http://localhost:7860. BentoML's api endpoint is also accessible at http://localhost:3000. To show all options that you can change (like server's port), just run python start-server --help

Example Prompt

Prompt

orange and black, head shot of a woman standing under street lights, dark theme, 
Frank Miller, cinema, ultra realistic, ambiance, insanely detailed and intricate, 
hyper realistic, 8k resolution, photorealistic, highly textured, intricate details

Negative Prompt

 tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, 
 mutation, mutated, extra limbs, extra legs, extra arms, disfigured, deformed, 
 cross-eye, body out of frame, blurry, bad art, bad anatomy, blurred, text, 
 watermark, grainy

Results Generated Image

๐Ÿš€ Bringing it to Production ๐Ÿš€

For this project, as it requires huge GPU devices which we typically do not have locally, it is particularly beneficial to deploy via โ˜๏ธ BentoCloud -- A managed distributed compute platform for Machine Learning Serving.

Otherwise, BentoML offers a number of options for deploying and hosting online ML services into production, learn more at Deploying Bento.

๐Ÿ‘ฅ Community ๐Ÿ‘ฅ

BentoML has a thriving open source community where thousands of ML/AI practitioners are contributing to the project, helping other users and discussing the future of AI. ๐Ÿ‘‰ Pop into our Slack community!