Machine Learning with MATLAB® and Amazon SageMaker® Demo

This repository demonstrates an approach to using Amazon SageMaker's support for bring-your-own-algorithms and frameworks to train and deploy MATLAB machine learning models.

For more information see Running MATLAB Machine Learning Jobs in Amazon SageMaker webinar from MATLAB Expo 2023.

Prerequistes

An Amazon Web Services™ (AWS) account.
A Linux® computer with
- MATLAB R2023a with
  - Statistics and Machine Learning Toolbox
  - MATLAB Compiler
  - MATLAB Compiler SDK
- make
- Docker®
- Python® 3 with pip and venv

This repository includes code to build a Docker container that uses MATLAB batch token licensing. Batch token licensing is currently only available as part of a pilot program. For more information about batch token eligibility, contact the MathWorks cloud team at cloud@mathworks.com.

Costs

You are responsible for the cost of the AWS services used.

What is in this repo?

Code

matlab - MATLAB code that provides sagemaker.MATLABEstimator() and other classes. It uses Amazon SageMaker Python SDK to call Amazon SageMaker APIs. For training a model this requires building the training image from the docker folder. To deploy a model this uses MATLAB Compiler SDK's compiler.package.microserviceDockerImage functionality.
docker - a Dockerfile that builds a container suitable for using in a SageMaker training job to train a MATLAB model. This image contains MATLAB code that glues the Amazon SageMaker training environment to the user supplied MATLAB code for training models.

Examples

TrainAndDeployClassificationTree - shows using MATLAB in a Amazon SageMaker training job to train a decision tree (using fitctree) on the Fisher iris data, deploying that model to a Amazon SageMaker endpoint, and then requesting a prediction from that endpoint.
DeployExistingModel.mlx - shows deploying a pretrained MATLAB model to Amazon SageMaker endpoint, and then requesting a prediction from that endpoint.

Getting Started

If you are using a AWS profile other than default add it to .env file in root of repo

echo AWS_PROFILE=myprofile > .env

Put your MATLAB batch licensing token in training.env

Training a Model

1. Create a MATLAB training image

This container is based on mathworks/matlab-deps with the following changes/additions:

Runs as root (as Amazon SageMaker requires this to have access to mounted volumes)
Install MATLAB, Statistics and Machine Learning Toolbox, and Parallel Computing Toolbox.
Adds matlab-batch
Adds MATLAB code from this repository that glues the Amazon SageMaker training environment to the user supplied MATLAB code.
Provides entrypoint that installs any other required products and then calls matlab-batch to run the training code.

To build and push the training image

cd docker
make build 
make test-local
make push

2. Do the training

In MATLAB:

Create a sagemaker.MATLABEstimator()
Upload training data to s3
Call fit()

What is this doing?

The TrainingFunction is analysed and a .mltbx file is created containing the TrainingFunction and all files required to execute it.
This mltbx file is copied to s3 and it's location passed to the training job via the hyperparameters.
Any additional products required are passed to the training image via the MATLAB_REQUIRED_PRODUCTS environment variable.
SageMaker runs the training image with the train command
The training image installs any products specified by MATLAB_REQUIRED_PRODUCTS and then calls matlab-batch train
train is a function provided by this repo that installs the training job mltbx and executes the training function from that

Deploying a model for inference

To deploy a model need to provide an inference handler that subclasses sagemaker_inference.DefaultInferenceHandler
When deploying a model the inference handler is compiled and packaged using Compiler SDK's compiler.package.microserviceDockerImage functionality.
- The Dockerfile generated by compiler.package.microserviceDockerImage is modified before building the image to meet SageMaker's requirements for an inference container.
The image is pushed to a Amazon Elastic Container Registry (Amazon ECR) registry
an Amazon SageMaker endpoint is create that uses that container, and then predicton can be made against that endpoint just like any other Amazon SageMaker endpoint.

Writing an inference handler

Create a new MATLAB class that inherits from sagemaker_inference.DefaultInferenceHandler
At a minimum add a %#function pragma to this implementation to specify the MATLAB functions needed to evaluate the model types you want to support.
Override any of decode_input, load_model, predict or encode_output methods.

decode_input: default implementation supports input data of type text/csv data and returns the data as a MATLAB table
encode_output: default implementation supports encodine a MATLAB table of output data with type text/csv
load_model: default implementation loads a variable called model from a MAT file called model.mat from the SageMaker model folder (typically /opt/ml/model))
predict: default implementation attempts to evaluate the loaded model output = model(inputData).

mathworks/Machine-Learning-with-MATLAB-and-Amazon-Sagemaker-Demo