/mindsdb-sagemaker-container

This code is used for making MindsDB run on Amazon SageMaker.

Primary LanguagePythonMIT LicenseMIT

MindsDB SageMaker Container

This repository contains the MindsDB containers for use with SageMaker.

MindsDB container supports two execution modes on SageMaker. Training, where MindsDB uses input data to train a new model and serving where it accepts HTTP requests and uses the previously trained model to do a prediction.

Table of contents

Build an image

Execute the following command to build the image:

docker build -t mindsdb-impl .

Note that mindsdb-impl will be the name of the image.

Test the container locally

All of the files for testing the setup are located inside the local_test directory.

Test directory

  • train_local.sh: Instantiate the container configured for training.
  • serve_local.sh: Instantiate the container configured for serving.
  • predict.sh: Run predictions against a locally instantiated server.
  • test-dir: This subdirectory is mounted in the container.
  • test_data: This subdirectory contains a few tabular format datasets used for getting the predictions.
  • input/data/training/file.csv: The training data.
  • model: The directory where mindsdb writes the model files.
  • output: The directory where mindsdb can write its failure file.
  • call.py: This cli can be used for testing the deployed model on SageMaker endpoint

All of the files under test-dir are mounted into the container and mimics the SageMaker directory structure.

Run tests

To train the model execute train script and specify the tag name of the docker image:

./train_local.sh mindsdb-impl

The train script will use the dataset that is located in the input/data/training/ directory.

Then start the server:

./serve_local.sh mindsdb-impl

And make predictions by specifying the payload file in json format:

./predict.sh payload.json

Push the image to Amazon Elastic Container Service

Use the shell script build-and-push.sh, to push the latest image to the Amazon Container Services. You can run it as:

 ./build-and-push.sh mindsdb-impl 

The script will look for an AWS EC repository in the default region that you are using, and create a new one if that doesn't exist.

Training

When you create a training job, Amazon SageMaker sets up the environment, performs the training, then store the model artifacts in the location you specified when you created the training job.

Required parameters

  • Algorithm source: Choose Your own algorithm and provide the registry path where the mindsdb image is stored in Amazon ECR 846763053924.dkr.ecr.us-east-1.amazonaws.com/mindsdb_impl
  • Input data configuration: Choose S3 as a data source and provide path to the backet where the dataset is stored e.g s3://bucket/path-to-your-data/
  • Output data configuration: This would be the location where the model artifacts will be stored on s3 e.g s3://bucket/path-to-write-models/

Add HyperParameters

You can use hyperparameters to finely control training. The required parameter for training models with mindsdb is: to_predict parameter. That is the column we want to learn to predict given all the data in the file e.g to_predict = Class

Inference

You can also create a model, endpoint configuration and endpoint using AWS Management Console .

Create model

Choose the role that has the AmazonSageMakerFullAccess IAM policy attached. Next, you need to provide the location of the model artifacts and inference code.

  • Location of inference code image: Location to the ECR image 846763053924.dkr.ecr.us-east-1.amazonaws.com/mindsdb_impl:latest
  • Location of model artifacts - optional Path to the s3 where the models are saved. This is the same location that you provide on train job s3://bucket/path-to-write-models/

Create endpoint

First, create an endpoint configuration. In the configuration, specify which models to deploy and hardware requirements for each. The required option is Endpoint configuration name and then add the previously created model. Then go to Create and configure endpoint, add the Endpoint name, and Attach endpoint configuration. Usually, it would take around few minutes to start the instance and create endpoint.

Call endpoint

When the endpoint is in InService status, you can create python script or notebook from which you can get the predictions.

import boto3

endpointName = 'mindsdb-impl'

# read test dataset
with open('diabetest-test.csv', 'r') as reader:
        payload = reader.read()
# Talk to SageMaker
client = boto3.client('sagemaker-runtime')
response = client.invoke_endpoint(
    EndpointName=endpointName,
    Body=payload,
    ContentType='text/csv',
    Accept='Accept'
)
print(response['Body'].read().decode('ascii'))
//mindsdb prediction response
{
"prediction": "* We are 96% confident the value of "Class" is positive.", 
 "class_confidence": [0.964147493532568]
}

Or you can use call.py cli located under local_test e.g:

python3 call.py --endpoint mindsdb-impl --dataset test_data/diabetes-test.json --content-type application/json

Using the SageMaker Python SDK

SageMaker provides Estimator implementation that runs SageMaker compatible custom Docker containers, enabling our own MindsDB implementation.

Starting train job

he Estimator defines how you can use the container to train. This is simple example that includes the required configuration to start training:

import sagemaker as sage

#Add AmazonSageMaker Execution role here
role = "arn:aws:iam:"

sess = sage.Session()
account = sess.boto_session.client('sts').get_caller_identity()['Account']
bucket_path = "s3://mdb-sagemaker/models/"
region = sess.boto_session.region_name
image = '{}.dkr.ecr.{}.amazonaws.com/mindsdb_lts:latest'.format(account, region)

#Hyperparameters to_predict is required for MindsDB container
mindsdb_impl = sage.estimator.Estimator(image,
                       role, 1, 'ml.m4.xlarge',
                       output_path=bucket_path,
                       sagemaker_session=sess,
                       base_job_name="mindsdb-lts-sdk",
                       hyperparameters={"to_predict": "Class"})

dataset_location = 's3://mdb-sagemaker/diabetes.csv'
mindsdb_impl.fit(dataset_location)

Deploy model and create endpoint

The model can be deployed to SageMaker by calling deploy method.

predictor = mindsdb_impl.deploy(1, 'ml.m4.xlarge', endpoint_name='mindsdb-impl')

The deploy method configures the Amazon SageMaker hosting services endpoint, deploy model and launches the endpoint to host the model. It returns RealTimePredictor object, from which you can get the predictions from.

with open('test_data/diabetes-test.csv', 'r') as reader:
        when_data = reader.read()
print(predictor.predict(when_data).decode('utf-8'))

The predict endpoint accepts test datasets in CSV, Json, Excel data formats.

Delete the endpoint

Don't forget to delete the endpoint when you are not using it.

sess.delete_endpoint('mindsdb-impl')

Other usefull resources