CI/CD Pipeline for deploying custom Sagemaker ML models using AWS SAM and Step Functions (in AWS GovCloud)
Automating the build and deployment of machine learning models is an important step in creating production ready machine learning services. Models need to be retrained and deployed when code and/or data are updated. This project provides an overview on use of Step Functions native service integrations with Sagemaker to train, deploy ML models, test results, and finally expose an inference endpoint using API Gateway and Lambda function. This Step Function also provides a way to wait for human approval, before the state transitions can progress towards final ML Model Inference endpoints configurations and deployment.
The following diagram describes the flow of the Step Function StateMachine. There are several points where the StateMachine has to poll and wait for a task to complete.
Code for creating and operating ML Ops pipeline is divided into 2 Github Repositories, this is the second part repository, which focuses on building and deploying ML Models to ECR and executing the step functions created in first Github repo.
- Set up an AWS account. (instructions)
- Configure AWS CLI and a local credentials file. (instructions)
- Clone this repo.
git clone https://github.com/bluecrayon52/codepipeline-ecr-build-sf-execution-govcloud.git
- Open VS Code, and open the folder where repo was cloned. Folder structure should look like shown below
- To deploy this cloudformation template to AWS, follow below given steps, use the params.json file to input your GitHubRepo, GitHubBranch, GitHubToken, GitHubUser and MlOpsStepFunctionArn. Instructions to get value for MlOpsStepFunctionArn, can be found our here.
aws cloudformation create-stack --stack-name codepipeline-ecr-build-sf-execution --template-body file://cfn/pipeline-cfn.yaml --parameters file://cfn/params.json --capabilities CAPABILITY_NAMED_IAM
- This cloudformation template will create the Code Pipeline, which will trigger code builds, from repository when a file changes are committed into to the container folder of the repo. (Usually this will happen, when data scientist would update the model development code and commit it to the repo.)