Amazon Sagemaker Xgboost Pipeline
This repository is about provisioning MLOps pipeline for Amazon Sagemaker built-in Xgboost model
Prerequisites
- awscli
- Nodejs 12.x+
- Python 3.7+
- Docker
- AWS Account and Locally configured AWS credential
Installation
Install project dependencies
$ cd infra
$ npm i
Install cdk in global context and run cdk bootstrap
if you did not initailize cdk yet.
$ npm i -g cdk@1.100.0
$ cdk bootstrap
Open config.ts and edit App.Webhook variable which URI will be POSTed
by StepFunction's Termnial Task(Succeed / Fail)
Deploy CDK Stacks on AWS
$ cdk deploy "*" --require-approval never
Usage
Upload dataset to S3
Deployment will display bucket arn on the terminal,
SagemakerXgboostDemoInfraStack.SagemakerBucketOutput = sagemakerxgboostdemoinfra-sagemakerbucketXXXXXXXX-YYYYYYYYYYYY
Set it as environment to use below
$ export BUCKET_NAME=sagemakerxgboostdemoinfra-sagemakerbucketXXXXXXXX-YYYYYYYYYYYY
Upload original dataset to the bucket. the dataset is credit card clients dataset from UCI with the data, we are going to classify the given user whether overdue the load or not, next month.
$ aws s3 cp ../data/card.xls s3://$BUCKET_NAME/card.xls
Execute statemachine
Deployment will display state-machine-arn on the terminal,
SagemakerXgboostDemoInfraStack.SagemakerStatesStatemachineArnXXXXXXXX = arn:aws:states:ap-northeast-2:XXXXXXXXXXXX:stateMachine:StateMachine
Set it as environment to use below
$ export STATE_MACHINE=arn:aws:states:ap-northeast-2:XXXXXXXXXXXX:stateMachine:StateMachine
Run statemachine with AWSCLI
$ aws stepfunctions start-execution --state-machine-arn $STATE_MACHINE
{
"executionArn": "arn:aws:states:ap-northeast-2:929831892372:execution:StateMachine:b1b23dd1-b2e6-40dd-b1b8-b07183505d9e",
"startDate": 1617504354.973
}
Visit AWS StepFunctions Console to see progress of the statemachine
Test
Ref notebook
Cleanup
$ cdk destroy "*"