Introduction to Amazon SageMaker

Introduction

In this lesson, we'll learn about Amazon SageMaker, and explore some of the common use cases it covers for data scientists.

Objectives

  • List the use cases of Amazon SageMaker

What is SageMaker?

SageMaker is a platform created by Amazon to centralize all the various services related to Data Science and Machine Learning. If you're a data scientist working on AWS, chances are that you'll be spending most (if not all) of your time in SageMaker getting things done. You can get to SageMaker by just searching for "SageMaker" inside the spotlight search bar in the AWS Console.

SageMaker Use Cases

When you visit the page for SageMaker, you'll notice that the following graphic highlighting the various use cases SageMaker can help with:

You'll also notice these same categories on the sidebar on the left side of the screen, with more detailed links to services that fall under each category:

Here's a brief explanation of what each of these service areas are used for in a professional data science setting.

Ground Truth

One of the hardest, most expensive, and most tedious parts of data science is getting the labels needed for supervised learning projects. For projects inside companies, it's quite common to start by gathering the proprietary data needed in order to train a model that can answer the business question and/or provide the service your company needs. One of the major use cases SageMaker provides is a well-structured way to manage data labeling projects. SageMaker GroundTruth allows you to manage private teams, in case your information is sensitive, or to manage public teams by leveraging AWS Mechanical Turk, which crowdsources labels from an army of public contractors that have signed up and are paid by the label.

Recently, Amazon launched an automated labeling service that makes use of machine learning models to generate labels in a human-in-the-loop format, where only labels that are above a particular confidence threshold (which you set yourself) are auto-generated by the model. This allows your contractors to focus only on the tough examples, and saves you from having to pay as much for labels for the easy examples which a model can handle.

Notebooks

These are exactly what they sound like -- cloud-based jupyter notebooks, a data scientist's 'bread and butter'! SageMaker notebooks are just like regular jupyter notebooks, with a bit more added functionality. For instance, it's quite easy to choose from a bunch of pre-configured kernels to select which version of Python/TensorFlow/etc. you want to use. You can start a notebook from scratch inside SageMaker and do all of your work in the cloud, or you can upload preexisting notebooks into SageMaker, allowing you to do you work on a local machine and move it over to the cloud when you're ready for training!

We strongly recommend you take a minute to poke around inside a SageMaker notebook to get a feel for what it looks like and what it can do. They're pretty amazing!

Training

SageMaker's training services allow you to easily leverage cloud computing with AWS's specialized GPU and TPU servers, allowing you to train massive models that simply wouldn't be possible on a local machine. There are a ton of configuration options, and you can easily set budgets, limits, training times, and even auto-tune your hyperparameters! Although this is outside the scope of our lessons on AWS, Amazon provides some pretty amazing (and fast!) tutorials about how to use more specific services like cloud training or model tuning once you've completed this section!

Inference

Arguably the most important part of the data science pipeline, Inference services focus on allowing you to create endpoints so that people can consume your models over the internet! One of the most handy parts of SageMaker's approach to inference is the fact that you can productionize your own model, or just use one of theirs! While there are certainly times where you'll need to create, train, and host your own model, AWS has made things simple by allowing you to use their own models and charging you on a per-use basis. For instance, let's say that you needed to make some time series forecasts. While you could go down the very complicated route of training your own model, you could also just make use of AWS SageMaker's DeepAR model, which uses the most cutting-edge time series model available to make forecasts on your data.

Summary

In this lesson, we learned about Amazon SageMaker, and explored some of the common use cases it covers for data scientists!