Amazon Sagemaker Xgboost Pipeline

This repository is about provisioning MLOps pipeline for Amazon Sagemaker built-in Xgboost model

Prerequisites

  • awscli
  • Nodejs 12.x+
  • Python 3.7+
  • Docker
  • AWS Account and Locally configured AWS credential

Installation

Install project dependencies

$ cd infra
$ npm i

Install cdk in global context and run cdk bootstrap if you did not initailize cdk yet.

$ npm i -g cdk@1.100.0
$ cdk bootstrap

Open config.ts and edit App.Webhook variable which URI will be POSTed by StepFunction's Termnial Task(Succeed / Fail)

Deploy CDK Stacks on AWS

$ cdk deploy "*" --require-approval never

Usage

Upload dataset to S3

Deployment will display bucket arn on the terminal,

SagemakerXgboostDemoInfraStack.SagemakerBucketOutput = sagemakerxgboostdemoinfra-sagemakerbucketXXXXXXXX-YYYYYYYYYYYY

Set it as environment to use below

$ export BUCKET_NAME=sagemakerxgboostdemoinfra-sagemakerbucketXXXXXXXX-YYYYYYYYYYYY

Upload original dataset to the bucket. the dataset is credit card clients dataset from UCI with the data, we are going to classify the given user whether overdue the load or not, next month.

$ aws s3 cp ../data/card.xls s3://$BUCKET_NAME/card.xls

Execute statemachine

Deployment will display state-machine-arn on the terminal,

SagemakerXgboostDemoInfraStack.SagemakerStatesStatemachineArnXXXXXXXX = arn:aws:states:ap-northeast-2:XXXXXXXXXXXX:stateMachine:StateMachine

Set it as environment to use below

$ export STATE_MACHINE=arn:aws:states:ap-northeast-2:XXXXXXXXXXXX:stateMachine:StateMachine

Run statemachine with AWSCLI

$ aws stepfunctions start-execution --state-machine-arn $STATE_MACHINE
{
    "executionArn": "arn:aws:states:ap-northeast-2:929831892372:execution:StateMachine:b1b23dd1-b2e6-40dd-b1b8-b07183505d9e",
    "startDate": 1617504354.973
}

Visit AWS StepFunctions Console to see progress of the statemachine

Test

Ref notebook

Cleanup

$ cdk destroy "*"