/reinvent2019-aim362-sagemaker-debugger-model-monitor

Build, train & debug, and deploy & monitor with Amazon SageMaker

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

Build, train & debug, and deploy & monitor with Amazon SageMaker

Introduction

Amazon SageMaker is a fully managed service that removes the heavy lifting from each step of the machine learning workflow, and provides every developer and data scientist with the ability to build, train, and deploy machine learning (ML) models quickly. In this interactive workshop, we will work on the different aspects of the ML workflow to build, train, and deploy a model using all the capabilities of Amazon SageMaker including the ones that we announced at re:Invent 2019. We will use the Amazon SageMaker to build, train & debug models with Amazon SageMaker Debugger, and deploy & monitor with Amazon SageMaker Model Monitor. Let’s build together!

Datasets

In this workshop, we will go through the steps of training, debugging, deploying and monitoring a network traffic classification model.

For training our model we will be using datasets CSE-CIC-IDS2018 by CIC and ISCX which are used for security testing and malware prevention. These datasets include a huge amount of raw network traffic logs, plus pre-processed data where network connections have been reconstructed and relevant features have been extracted using CICFlowMeter, a tool that outputs network connection features as CSV files. Each record is classified as benign traffic, or it can be malicious traffic, with a total number of 15 classes.

The goal is to demonstrate how to execute training of a network traffic classification model using the Amazon SageMaker framework container for XGBoost, training and debugging. Once trained how to then deploy and monitor the model performance.

Getting started

Initially have an open AWS account, with privileges to create and run Amazon SageMaker notebooks and access to S3 buckets.

You can run this workshop in all commercial AWS regions where Amazon SageMaker is GA.

Create a managed Jupyter Notebook instance

First, let's create an Amazon SageMaker managed Jupyter notebook instance. An Amazon SageMaker notebook instance is a fully managed ML compute instance running the Jupyter Notebook application. Amazon SageMaker manages creating the instance and related resources.

  1. In the AWS Management Console, click on Services, type “SageMaker” and press enter.

    Search SageMaker
  2. You’ll be placed in the Amazon SageMaker dashboard. Click on Notebook instances either in the landing page or in the left menu.

    SageMaker dashboard
  3. Once in the Notebook instances screen, click on the top-righ button Create notebook instance.

    Notebook Instances screen
  4. In the Create notebook instance screen

    Create Notebook Instance screen
    1. Give the Notebook Instance a name like aim362-workshop or what you prefer

    2. Choose ml.t2.medium as Notebook instance type

    3. In the IAM role dropdown list you need to select an AWS IAM Role that is configured with security policies allowing access to Amazon SageMaker (full access) and Amazon S3 (default SageMaker buckets). If you don't have any role with those privileges, choose Create New Role and configure the role as follows:

      Create Notebook Instance Role
    4. Keep No VPC selected in the VPC dropdown list

    5. Keep No configuration selected in the Lifecycle configuration dropdown list

    6. Keep No Custom Encryption selected in the Encryption key dropdown list

    7. Finally, click on Create notebook instance

  5. You will be redirected to the Notebook instances screen and you will see a new notebook instance in Pending state.

    Notebook instance pending

    Wait until the notebook instance is status is In Service and then click on the Open Jupyter Lab button to be redirected to Jupyter Lab.

    Notebook instance in service

    The Jupyter Lab interface will load, as shown below.

    Jupyter Lab screen

Download workshop code to the notebook instance

All the code of this workshop is implemented and available for download from this GitHub repository.

As a consequence, in this section we will clone the GitHub repository into the Amazon SageMaker notebook instance and access the Jupyter Notebooks to run the workshop.

  1. From the file menu, click on New > Terminal

    Jupyter New Terminal tab

    This will open a terminal tab in the Jupyter Lab interface

    Jupyter Terminal Tab
  2. Execute the following commands in the terminal

    cd SageMaker/
    git clone https://github.com/aws-samples/reinvent2019-aim362-sagemaker-debugger-model-monitor.git
    
  3. When the clone operation completes, the folder reinvent2019-aim362-sagemaker-debugger-model-monitor will appear automatically in the file browser on the left (if not, you can hit the Refresh button)

    Jupyter Cloned Workshop Screen
  4. Browse to the folder 01_train_and_debug and open the file train_and_debug.ipynb to get started.

Modules

This workshops consists of 2 modules:

You must comply with the order of modules, since the outputs of a module are inputs of the following one.

License

The contents of this workshop are licensed under the Apache 2.0 License.

Authors

Giuseppe A. Porcelli - Principal, ML Specialist Solutions Architect - Amazon Web Services EMEA
Paul Armstrong - Principal Solutions Architect - Amazon Web Services EMEA