/sagemaker-workshop-101

Hands-on demonstrations for data scientists exploring SageMaker

Primary LanguageJupyter Notebook

Getting Started with Amazon SageMaker

This repository accompanies a hands-on training event to introduce data scientists (and ML-ready developers / technical leaders) to core model training and deployment workflows with Amazon SageMaker.

Agenda

Sessions in suggested order:

  • builtin_algorithm_hpo_tabular: Demonstrating how to use (and tune the hyperparameters of) a pre-built, SageMaker-provided algorithm (Applying XGBoost to tabular data)
  • (Optional) custom_sklearn_rf: Introductory example showing how to bring your own algorithm, using SageMaker's SKLearn container environment as a base (Predicting housing prices)
  • custom_tensorflow_keras_nlp: Demonstrating how to bring your own algorithm, using SageMaker's TensorFlow container environment as a base (Classifying news headline text)
  • migration_challenge_keras_image: A challenge to use what you've learned to migrate an existing notebook to SageMaker model training job and real-time inference endpoint deployment (Classifying MNIST DIGITS images)

Deploying in Your Own Account

Our standard setup for this workshop is detailed in .ee.tpl.yaml, a CloudFormation template file. You can deploy the same via the AWS CloudFormation Console.

If you've onboarded to SageMaker Studio and would like to use that instead of a Notebook Instance, you'll need to take the following additional steps:

  1. To download this repository, launch a System terminal (from the Other section of the launcher screen) and run git clone https://github.com/apac-ml-tfc/sagemaker-workshop-101.
  2. To enable the widgets in the NLP example, navigate your System terminal to the folder with cd sagemaker-workshop-101 and run ./init-studio.sh. Note this must be run from a System terminal and not an Image terminal (other option on the launcher screen). Refresh your browser window when the script completes.
  3. You'll be asked to select a kernel when you first open each notebook, because the available kernels in Studio differ from those in Notebook Instances. Use Python 3 (Data Science) as standard and Python 3 (TensorFlow CPU Optimized) specifically for the 'local' notebooks in NLP and migration challenge folders (which fit TensorFlow models within the notebook itself).

You can refer to the "How Are Amazon SageMaker Studio Notebooks Different from Notebook Instances?" docs page for more details on differences between the Studio and Notebook Instance environments. As that page notes, SageMaker studio does not yet support local mode: which we find can be useful to accelerate debugging in the migration challenge, and is one reason we typically run this session on Notebook Instances instead.