Latent Consistency LoRAs with BentoML
This project demonstrates how to deploy a REST API server for Stable Diffusion with minimal inference steps. We'll use BentoML to convert this HuggingFace example for Latent Consistency LoRAs.
Prerequisites
- You have installed Python 3.8+ and
pip
. See the Python downloads page to learn more. - You have a basic understanding of key concepts in BentoML, such as Services. We recommend you read Quickstart first.
- (Optional) We recommend you create a virtual environment for dependency isolation for this project. See the Conda documentation or the Python documentation for details.
Install dependencies
git clone https://github.com/bentoml/BentoLCM.git
cd BentoLCM
pip install -r requirements.txt
Run the BentoML Service
We have defined a BentoML Service in service.py
. Run bentoml serve
in your project directory to start the Service.
bentoml serve .
The server is now active at http://localhost:3000. You can interact with it using the Swagger UI or in other different ways.
CURL
curl -X 'POST' \
'http://localhost:3000/txt2img' \
-H 'accept: image/*' \
-H 'Content-Type: application/json' \
-d '{
"prompt": "close-up photography of old man standing in the rain at night, in a street lit by lamps, leica 35mm summilux"
}' -o out.jpg
Deploy to production
After the Service is ready, you can deploy the application to BentoCloud for better management and scalability. A configuration YAML file (bentofile.yaml
) is used to define the build options for your application. It is used for packaging your application into a Bento. See Bento build options to learn more.
Make sure you have logged in to BentoCloud, then run the following command in your project directory to deploy the application to BentoCloud.
bentoml deploy .
Once the application is up and running on BentoCloud, you can access it via the exposed URL.
Note: Alternatively, you can use BentoML to generate a Docker image for a custom deployment.