google-cloud-run-deploy: A Python repository from BentoML

⚠️ BentoCTL project has been deprecated

Plese see the latest BentoML documentation on OCI-container based deployment workflow: https://docs.bentoml.com/

Google Cloud Run Operator

Cloud Run is Google Cloud's serverless solution for containers. With Cloud Run, you can develop and deploy highly scalable containerized applications on a fully managed serverless platform. Cloud Run is great for running small to medium models since you only pay for the compute you use and it is super scalable.

With the combination of BentoML and bentoctl, you can enjoy the flexibility of Cloud Run with your favourite ML frameworks and easily manage the infrastructure via terraform.

Note: This operator is compatible with BentoML version 1.0.0 and above. For older versions, please switch to the branch pre-v1.0 and follow the instructions in the README.md.

Quickstart with bentoctl
Configuration Options

Quickstart with bentoctl

This quickstart will walk you through deploying a bento into Google Cloud Run. Make sure to go through the prerequisites section and follow the instructions to set everything up.

Prerequisites

Google cloud CLI tool - Install instruction: https://cloud.google.com/sdk/docs/install and make sure all your gcloud components are up to date. Run gcloud components update to update
Terraform - Terraform is a tool for building, configuring, and managing infrastructure. Installation instruction: www.terraform.io/downloads
Docker - Install instruction: https://docs.docker.com/install
A working bento - for this guide, we will use the iris-classifier bento from the BentoML quickstart guide.

Steps

Install bentoctl via pip
```
pip install bentoctl
```
Install the operator

Bentoctl will install the official Google Cloud Run operator and its dependencies. The Operator contains the Terraform templates and sets up the registries reqired to deploy to GCP.
```
bentoctl operator install google-cloud-run
```

Initialize deployment with bentoctl

Follow the interactive guide to initialize the deployment project.

$ bentoctl init

Bentoctl Interactive Deployment Config Builder

Welcome! You are now in interactive mode.

This mode will help you set up the deployment_config.yaml file required for
deployment. Fill out the appropriate values for the fields.

(deployment config will be saved to: ./deployment_config.yaml)

api_version: v1
name: quickstart
operator: google-cloud-run
template: terraform
spec:
    project_id: bentoml-316710
    region: asia-east1
    port: 3000
    min_instances: 0
    max_instances: 1
    memory: 512M
    cpu: 1
filename for deployment_config [deployment_config.yaml]:
deployment config generated to: deployment_config.yaml
✨ generated template files.
  - ./main.tf
  - ./bentoctl.tfvars

This will also run the bentoctl generate command for you and will generate the main.tf terraform file, which specifies the resources to be created and the bentoctl.tfvars file which contains the values for the variables used in the main.tf file.

Build and push docker image into Google Container Registry.
```
bentoctl build -b iris_classifier:latest -f deployment_config.yaml
```
The iris-classifier service is now built and pushed into the container registry and the required terraform files have been created. Now we can use terraform to perform the deployment.
Apply Deployment with Terraform
1. Initialize terraform project. This installs the Google Cloud provider and sets up the terraform folders.
```
terraform init
```
2. Apply terraform project to create Cloud Run deployment
```
terraform apply -var-file=bentoctl.tfvars -auto-approve
```

Test deployed endpoint

The iris_classifier uses the /classify endpoint for receiving requests so the full URL for the classifier will be in the form {EndpointUrl}/classify.

URL=$(terraform output -json | jq -r .Endpoint.value)/classify
curl -i \
  --header "Content-Type: application/json" \
  --request POST \
  --data '[5.1, 3.5, 1.4, 0.2]' \
  $URL

Delete deployment Use the bentoctl destroy command to remove the registry and the deployment
```
bentoctl destroy -f deployment_config.yaml
```

Configuration Options

This is the list of configurations you can use to deploy your bento to Google Cloud Run. For more information about options check the corresponding Google Cloud Run docs provided.

The required configuration is:

project_id: Your project id. This will be a unique id for each of your projects, specifying unique resources available to each project. If you haven't created a project, head over to the console and create it
- check projects you already have by running gcloud config get-value project
region: The region to which you want to deploy your Cloud Run service. Check the official list to know more about all the regions available
port: The port that Cloud Run container should listen to. Note: this should be the same as the port that the bento service is listening to (by default 5000)
min_instances: The number of minimum instances that Cloud Run should keep active. Check the docs for more info
max_instances: The maximum number of instances Cloud Run should scale up to under load. Check the docs on how to configure it
max_concurrency: The maximum number of requests that can be processed simultaneously by a given container instance. Check the docs on how to configure it
memory: The RAM that should be available for each instance. If your model uses more than the specified RAM, it will be terminated. Check the docs
cpu: The number of CPUs needed for each instance. Check the docs for more info
cpu_always_allocated: Setting the CPU to be always allocated can be useful for running short-lived background tasks and other asynchronous processing tasks. Check the docs for more info
invokers: The principals or groups to grant the ability to invoke the service. Check the docs for more info