Latent Consistency LoRAs with BentoML

This project demonstrates how to deploy a REST API server for Stable Diffusion with minimal inference steps. We'll use BentoML to convert this HuggingFace example for Latent Consistency LoRAs.

Prerequisites

You have installed Python 3.8+ and pip. See the Python downloads page to learn more.
You have a basic understanding of key concepts in BentoML, such as Services. We recommend you read Quickstart first.
(Optional) We recommend you create a virtual environment for dependency isolation for this project. See the Conda documentation or the Python documentation for details.

Install dependencies

git clone https://github.com/bentoml/BentoLCM.git
cd BentoLCM
pip install -r requirements.txt

Run the BentoML Service

We have defined a BentoML Service in service.py. Run bentoml serve in your project directory to start the Service.

bentoml serve .

The server is now active at http://localhost:3000. You can interact with it using the Swagger UI or in other different ways.

CURL

curl -X 'POST' \
  'http://localhost:3000/txt2img' \
  -H 'accept: image/*' \
  -H 'Content-Type: application/json' \
  -d '{
  "prompt": "close-up photography of old man standing in the rain at night, in a street lit by lamps, leica 35mm summilux"
}' -o out.jpg

Deploy to production

After the Service is ready, you can deploy the application to BentoCloud for better management and scalability. A configuration YAML file (bentofile.yaml) is used to define the build options for your application. It is used for packaging your application into a Bento. See Bento build options to learn more.

Make sure you have logged in to BentoCloud, then run the following command in your project directory to deploy the application to BentoCloud.

bentoml deploy .

Once the application is up and running on BentoCloud, you can access it via the exposed URL.

Note: Alternatively, you can use BentoML to generate a Docker image for a custom deployment.