/shapemaker

An end-to-end template for AWS Sagemaker projects aiming for maximum flexibility

Primary LanguageJupyter NotebookMIT LicenseMIT

shapemaker

👓 Overview

shapemaker is an end-to-end template for Amazon SageMaker AWS projects aiming for maximum flexibility.

If you find, that SageMaker does not offer enough flexibility out-of-the-box with respect to customizing either

  1. training jobs
  2. endpoints or
  3. how to serve endpoints

then shapemaker might be a good fit for you.

shapemaker builds on the Bring Your Own Container (BYOC) SageMaker functionality for full developer control.

The template includes:

  • a minimalistic template for model code
  • a template for a docker image for model training
  • an endpoint docker image template for real-time inference
  • command-line functions for interacting with the model/endpoint
  • command-line functions for delivering and integrating the model w/SageMaker
  • workflows enabling continuous integration/delivery.

shapemaker targets full-stack data scientists with intermediate knowledge of python, Amazon SageMaker as well as AWS in general, docker, shell scripting and development of web applications.

🎦 Demo

Click the screen below to watch a quick walkthrough of some of the most important features of the 'shapemaker' template. The video goes through how to build training and endpoint images and how to create training jobs and endpoints from the command line. Furthermore I show how to enable shapemaker CI/CD workflows.

Watch the video

💻 Requirements

Cloud Services

  • Amazon Web Services*

Operating systems

  • Linux
  • macOS

Software

CI/CD

*: if you want to play around, you can create your own free account and use the AWS Free Tier resources free of charge up to specified limits for each service.

The template was tested on Linux Ubuntu 22.04 LTS w/AWS CLI v2.

ℹ️ How to use

🆕 Create project from template

Create a project from the shapemaker template using Cookiecutter:

cookiecutter gh:smaakage85/shapemaker

The inputs for the template are described below:

Input Description
PROJECT_NAME Name of model project.
PY_VERSION Which version of python to use, e.g. '3.9'.
DIR_MODEL_LOCAL Local directory for model artifact storage, e.g. './artifacts'.
DIR_TMP Temporary files directory, e.g. '/tmp'.
AWS_ACCOUNT_ID 12-digit AWS account ID.
AWS_DEFAULT_REGION AWS default region.
ECR_REPO Name of AWS ECR repository, where containers are published.
SAGEMAKER_ROLE Name of the Sagemaker execution role to be assumed by Sagemaker.
BUCKET_ARTIFACTS Name of S3 bucket for model artifact storage. NOTE: prefix with 'sagemaker' for immediate Sagemaker access, e.g. 'sagemaker_artifacts_blablabla'.

NOTE: do NOT enquote input values.

🔧 Set-up project

Initialize project by executing make init from the command line in the project directory. The init target makes the included shell scripts executable and provisions relevant AWS infrastructure.

Export project-specific environment variables automatically with direnv, i.e. by invoking direnv allow.

📁 Template structure

To help you navigate in the shapemaker template here is an overview of the folder structure:

./
├── .github/    
│   └── workflows/            # Workflows for automation, CI/CD.
├── modelpkg/                 # Python package defining model logic.
|   |   construct.py          # Code for constructing and training the model etc.
│   └── tests/                # Unit tests for model code.
├── aws/                      # Shell scripts for integrating the project with Sagemaker.
├── configs/                  # Configurations for Sagemaker endpoints, training jobs, etc.
├── images/                   # Docker images for model training and model endpoint.
├── server/                   # Configuration for a default NGINX web server for the model endpoint.*
├── .envrc                    # Project-specific environment variables.
├── Makefile                  # Command-line functions for project-specific tasks.
├── train.py                  # Script for training the model. Builds into training image.
├── app.py                    # Application code for the model endpoint. Builds into endpoint image.
├── requirements_modelpkg.txt # Python packages required by the model.
└── requirements_dev.txt      # .. All other python packages needed in development mode.

*: copy pasta from AWS example.

The level of modification needed for the individual files will depend on your specific use-case.

🐚 Command-line functions

All tasks related to interacting with the model project are implemented as command-line functions in ./Makefile implying that functions are invoked with make [target], e.g. make build_training_image.

If you want to build, train and deploy a model on-the-fly you can do it by invoking a sequence of make targets, i.e.:

  1. make init
  2. make build_training_image
  3. make push_training_image
  4. make create_training_job
  5. make build_endpoint_image
  6. make push_endpoint_image
  7. make create_endpoint

make + space + tab + tab lists all available make targets.

🔁 CI/CD workflows

shapemaker ships with a number of automation (CI/CD) workflows implemented with Github Actions.

To enable CI/CD workflows, upload your project to Github and connect the Github repository with your AWS account by providing your AWS credentials as Github Secrets. Secrets should have names:

  1. AWS_ACCESS_KEY_ID
  2. AWS_SECRET_ACCESS_KEY

By default, every commit to main triggers a workflow ./github/workflows/deliver_images.yaml, that runs unit tests and builds and pushes training and endpoint images.

All workflows can be run manually.

📢 Shout-outs

A big thanks for the inspiration goes to:

📫 Contact

Please direct any questions and feedbacks to me!

If you want to contribute, open a PR.

If you encounter a bug or want to suggest an enhancement, please open an issue.