/amazon-sagemaker-devops-with-ml

Workshop content for applying DevOps practice to Machine Learning workloads using Amazon SageMaker

Primary LanguagePythonApache License 2.0Apache-2.0

Amazon SageMaker MLOps

The workshops contained in this repository are focused on demonstrating capabilities around MLOps - applying core DevOps practices to Machine Learning workloads using AWS Developer Tools combined with Amazon SageMaker. SageMakerIcon

Background

Applying DevOps practices to Machine Learning (ML) workloads is a fundamental practice to ensure models are deployed using a reliable and consistent process as well as establishing a strategy for retraining your models.

DevOps is not about automating the deployment of a model. It is about applying practices such as Continuous Integration(CI) and Continuous Delivery(CD) to the development lifecycle. These practices rely on automation to achieve CI/CD but applying automation alone is not synonymous with CI/CD. In this lab, you will create a deployment pipeline utilizing AWS Development Services and SageMaker to demonstrate how CI/CD can be applied to machine learning workloads. There is no one-size-fits-all model for creating a pipeline; however, the same general concepts explored in this workshop can be applied across various services/tooling to meet the same end result.

Core Tenets

  • Versioning: For Machine Learning workloads versioning of artifacts includes standard best practices around versioning of code and containers but also includes versioning of data, algorithms, and model artifacts.

  • Quality Gates: Ensuring a minimum degree of model quality before introducing a model into production. This can include gates across multiple components measuring the readiness to move to a target environment such as: model evaluation, system monitoring, and business impact evaluation.

  • Automated Deployment: Automated deployments of your model for predictions ensures consistency and reliability.

  • Automatic Retraining: Automated retraining, based on the strategy defined for your model, is key in being able to prevent model drift and take advantage of new ground truth data.

Workshop Contents

This repository demonstrates the following use cases covering reference pipelines for common scenarios. These are reference pipelines only and should be adjusted for specific use cases.

License

This library is licensed under the Apache 2.0 License.