AWS Sagemaker deployment tool
Sagemaker is a fully managed service for building ML models. BentoML provides great support for deploying BentoService to AWS Sagemaker without the additional process and work from users. With BentoML, users can enjoy the performance of Sagemaker with any popular ML frameworks.
Prerequisites
- An active AWS account configured on the machine with AWS CLI installed and configured
- Install instruction: https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html
- Configure AWS account instruction: https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html
- Docker is installed and running on the machine.
- Install instruction: https://docs.docker.com/install
- Install required python packages
$ pip install -r requirements.txt
Quickstart
You can try out the deployment script with the IrisClassifier for the iris dataset that is given in the BentoML quick start guide
-
Build and save Bento Bundle from BentoML quick start guide
-
Copy and change the sample config file given and change it according to your deployment specifications. Check out the config section to find the differenet options.
-
Create Sagemaker deployment with the deployment tool.
Run deploy script in the command line:
$ BENTO_BUNDLE_PATH=$(bentoml get IrisClassifier:latest --print-location -q) $ python deploy.py $BENTO_BUNDLE_PATH my-sagemaker-deployment sagemaker_config.json
Get Sagemaker deployment information and status
$ python describe.py my-sagemaker-deployment # Sample output { │ 'StackId': 'arn:aws:cloudformation:ap-south-1:213386773652:stack/iristest-endpoint/edd9d500-095c-11ec-bc08-06418f3882f0', │ 'StackName': 'iristest-endpoint', │ 'StackStatus': 'CREATE_COMPLETE', │ 'CreationTime': '08/30/2021, 06:38:47', │ 'LastUpdatedTime': '08/30/2021, 06:38:52', │ 'OutputApiId': '2f5qtdd2rf', │ 'EndpointURL': 'https://2f5qtdd2rf.execute-api.ap-south-1.amazonaws.com/prod', │ 'api_name': 'predict' }
-
Make sample request against deployed service. The url for the endpoint given in the output of the describe command or you can also check the API Gateway through the AWS console.
$ curl -i \ --header "Content-Type: application/json" \ --request POST \ --data '[[5.1, 3.5, 1.4, 0.2]]' \ yr3v9vh407.execute-api.ap-south-1.amazonaws.com/prod/predict # Sample Output HTTP/1.1 200 OK Connection: keep-alive Content-Type: application/json X-Request-Id: f499b6d0-ad9b-4d79-850a-3dc058bd67b2 Content-Length: 3 Date: Mon, 28 Jun 2021 02:50:35 GMT Server: Python/3.7 aiohttp/3.7.4.post0 Via: 1.1 vegur [0]%
-
Delete Sagemaker deployment
python delete.py my-sagemaker-deployment
The Internals
This section is all about how the deployment tool works internally and how the actual deployment happens so that if needed you can modify this tool to suit your deployment needs. Under the hood the deployment tool modifies the Bento Image to make it compatible with the Sagemaker Endpoint. This image is then built and pushed into an ECR repository. All the other components are created as part of the cloudformation stack, which creates the API Gateway, Lambda Function, Sagmaker Model, Sagemaker Endpoint Config and Sagemaker Endpoint. We have used an HTTP API Gateway + Lambda Function design to expose the Sagemaker Endpoint since the Lambda function gives us a lot more flexibility.
For each bento endpoint that you have, the tool creates the corresponding route in HTTP Gateway which invokes the Lambda function which in turn invokes the Sagmaker endpoint and proxies the results back to the client. Since we are using Cloudformation to create the stack you can easily change/modify the resource to match your deployment needs. generate_resources.py
contains the functions used to generate the cloudformation template which you can modifiy to suit you needs.
Deployment operations
configuration options
A sample configuration file has been given has been provided here. Feel free to copy it over and change it for you specific deployment values
timeout
: timeout for API request in secondsworkers
: Number of workers for Bento API serverregion
: AWS region where Sagemaker endpoint is deploying toskip_stack_deployment
: If this flag is present in the config_file, deployment tool will only build and push the image to ECR and skip creation of sagemaker endpoint resources. With this you get your bentoml model build so that it runs and sagemaker and pushed to ECR and you can use other methods to create the resources to deploy the image.instance_type
: The ML compute instance type for Sagemaker endpoint. See https://docs.aws.amazon.com/cli/latest/reference/sagemaker/create-endpoint-config.html for available instance typesinitial_instance_count
: Number of instances to launch initially.enable_data_capture
: Enable Sagemaker capture data from requests and responses and store the captured data to AWS S3data_capture_s3_prefix
: S3 bucket path for store captured datadata_capture_sample_percent
: Percentage of the data will be captured to S3 bucket.
Create a new deployment
Use Command line
python deploy.py <BENTO_BUNDLE_PATH> <DEPLOYMENT_NAME> <CONFIG_JSON default is ./sagemaker_config.json>
For example:
$ MY_BUNDLE_PATH=$(bentoml get IrisClassifier:latest --print-location -q)
$ python deploy.py $MY_BUNDLE_PATH my_deployment --config_json sagemaker_config.json
Use Python API
from deploy import deploy_to_sagemaker
deploy_to_sagemaker(BENTO_BUNDLE_PATH, DEPLOYMENT_NAME, CONFIG_JSON)
To create and push a model image to ECR without deploying the stack, use the flag --skip_stack_deployment
Update an existing deployment
Use Command Line
python update.py <DEPLOYMENT_NAME> <BENTO_BUNDLE_PATH> <API_NAME> <CONFIG_JSON default is sagemaker_config.json>
Use Python API
from update import update_deployment
update_deployment(BENTO_BUNDLE_PATH, DEPLOYMENT_NAME, CONFIG_JSON)
Describe deployment status and information
Use Command line
python get.py <DEPLOYMENT_NAME>
Use Python API
from describe import describe_deployment
describe_deployment(DEPLOYMENT_NAME)
Delete deployment
Use Command line
python delete.py <DEPLOYMENT_NAME>
Use Python API
from delete import delete_deployment
delete_deployment(DEPLOYMENT_NAME)