/stable-diffusion-deploy

Learn to serve Stable Diffusion models on cloud infrastructure at scale. This Lightning App shows load-balancing, orchestrating, pre-provisioning, dynamic batching, GPU-inference, micro-services working together via the Lightning Apps framework.

Primary LanguagePython


Use AI to inspire your art

Muse is live here

Run your ownLightning AIUse Muse LiveFull Tutorial

ReadTheDocs Slack license


Muse

Open source, stable-diffusion production server to show how to deploy diffusion models in a real production environment with: load-balancing, gpu-inference, performance-testing, micro-services orchestration and more. All handled easily with the Lightning Apps framework.

The app is live here.

Full tutorial on how to build this app.

image

Model

Muse uses the opensource Stable Diffusion model made available by stability AI. We apply a few fancy tricks to make the inference super fast.

Here's a small snippet showing our model server

    @torch.inference_mode()
    def predict(self, dreams: List[Data], entry_time: int):
        # handle timeout
        if time.time() - entry_time > INFERENCE_REQUEST_TIMEOUT:
            raise TimeoutException()

        # sets up the inference settings
        height = width = IMAGE_SIZE
        num_inference_steps = 50 if dreams[0].high_quality else 25

        prompts = [dream.prompt for dream in dreams]

        # GPU inference
        if torch.cuda.is_available():
            with autocast("cuda"):
                torch.cuda.empty_cache()
                pil_results = self._model(
                    prompts,
                    height=height,
                    width=width,
                    num_inference_steps=num_inference_steps,
                )
            # apply filter
            nsfw_content = self._safety_checker(pil_results)
            for i, nsfw in enumerate(nsfw_content):
                if nsfw:
                    pil_results[i] = Image.open("assets/nsfw-warning.png")
        else:
            time.sleep(3)
            pil_results = [Image.fromarray(np.random.randint(0, 255, (height, width, 3), dtype="uint8"))] * len(prompts)

        # return the model
        results = []
        for image in pil_results:
            buffered = BytesIO()
            image.save(buffered, format="PNG")
            img_str = base64.b64encode(buffered.getvalue()).decode("utf-8")
            # make sure pil_results is a single item array or it'll rewrite image
            results.append({"image": f"data:image/png;base64,{img_str}"})

        return results

Run your own

To run this app locally, follow these steps:

conda create --name muse_app python=3.9 --yes
conda activate muse_app

git clone https://github.com/Lightning-AI/stable-diffusion-deploy.git
cd stable-diffusion-deploy

bash dev_install.sh

## To run the app locally
python -m lightning run app app.py

## To run the app on the cloud to share it with your peers and users
python -m lightning run app app.py --cloud

You can configure Muse to select customize number of workers, batch size or select the stable diffusion version of your choice. Learn more.

About this Lightning App

Muse is a blueprint for building diffusion-based production systems with Lightning AI. This app shows you how to:

  • Host a multi-tenant Frontend & Backend application architecture
  • Full React.js UI
  • Micro-services orchestration
  • Cloud infrastructure pre-provisioning
  • Serves a gpu-powered diffusion model via a REST API
  • Dynamic GPU batching for inference requests
  • Load balancer that autoscales infrastructure with load-changes
  • Load Testing Lightning Component using Locust
  • Environment variables to parametrize execution environment

Architecture Diagram of Muse -

arch diagram

Slackbot

How to integrate Muse to a Slack Workspace

You can integrate this app in your Slack Workspace and send images in Slack channels.

This app uses the Slack Command Bot Component to interact with Slack commands.

You can also check out this video tutorial:

How to create the Slack Command Bot

Step 1: Go to https://api.slack.com and create an app.

Step 2: Copy the following tokens and secrets from the Slack API settings by going to https://api.slack.com/apps. These tokens have to be passed either as argument or environment variable to SlackCommandBot class.

Required Token name and environment variables:
  • Client ID (SLACK_CLIENT_ID)
  • Client Secret (CLIENT_SECRET)
  • Signing Secret (SIGNING_SECRET)
  • Bot User OAuth Token (BOT_TOKEN)
  • App-Level Token (SLACK_TOKEN)

Step 3:

Implement the SlackCommandBot.handle_command(...) method the way you want to interact with the commands. The return value will be shown only to you.

Step 4: (optional)

If you want your slack app to be distributable to public then you need to implement SlackCommandBot.save_new_workspace(...) which should save team_id and its corresponding bot_token into a database.

During the handle_command(...) method you will need to fetch bot_token based on the received team_id.