/mlops-amazon-sagemaker-devops-with-ml

Workshop content for applying DevOps practices to Machine Learning workloads using Amazon SageMaker

Primary LanguagePythonApache License 2.0Apache-2.0

Amazon SageMaker MLOps

The labs contained in this repository are focused on applying MLOps practices to Machine Learning(ML) workloads using Amazon SageMaker as the underlying service for model development, training, and hosting. The repository is organized by breaking out standard practices based on stages of adoption in ML workloads.

Stages of Adoption

Background

MLOps refers to a methodology that is built on applying DevOps practices to machine learning workloads. DevOps focuses on the intersection of development and operations disciplines to streamline software delivery across the Software Development Lifecycle(SDLC). MLOps focuses on the intersection of data science, data engineering in combination with existing DevOps practices to streamline model delivery across the Machine Learning Development Lifecycle (MLDC).

MLOps is the discipline of integrating ML workloads into release management, CI/CD, and operations. At many companies, ML projects start with data scientists operating within research teams that are isolated from other teams. As ML projects prove their value with initial proof-of-concept results, companies naturally want to move them into production. Just as DevOps integrates software development and operations, MLOps requires the integration of software development, operations, security, data engineering, and data science.

Objective

The goal in applying MLOps practices to ML workloads is to enable customers to accelerate the adoption of ML workloads and optimize operational aspects of building, deploying, and operating ML workloads. There are a set of practices and characteristics we see adopted as customers move from the manual stages typical with initial projects to being able to adopt ML workloads at scale.

Let's look at each stage of adoption at a high level...

Manual

The Manual stage is where we typically see customers beginning to incorporate ML projects into their overall strategy to drive business outcomes. At this stage, the MLDC typically involves a lot of manual hand-offs and processes.

01-Project

Technical Focus:

  • Team education (Machine Learning, AWS Services)
  • Building a model that provides business value
  • Ability to collaborate and share assets
  • Focus on core capabilities of building, training, and deploying a model

Getting Started Resources:

Repeatable

The Repeatable stage is where we see customers beginning to increase the number of ML models they are building and need to deploy as well as manage in production environments. At this stage, the focus is on automating pipelines required to provide a repeatable mechanism to deploy to target environments.

02-Repeatable

Technical Focus:

  • Automating data, training, and deployment pipelines.
  • Reducing the number of manual hand-offs
  • Working collaborative across cross-functional teams including necessary stakeholders (ex. Security, Compliance)

Resources/Labs:

Reliable

The Reliable stage is where we see customers beginning to apply more mature practices such as CI/CD and metric based retraining strategies to ML workloads. While the Repeatable stages focuses on applying automation to your pipelines, the Reliable stages builds on that level by starting to adopt more mature MLOps practices into your pipelines.

03-Reliable

Technical Focus:

  • Integrating higher level MLOps practices such as CI/CD with:

    • Source/Version Control for infrastructure (Infrastructure-as-Code), configuration (Configuration-as-Code), machine learning code, inference code, data, and packaged libraries/artifacts

    • End-to-end traceability (i.e. pipeline traceability, model lineage, data lineage)

    • Automated quality gates

    • Continuous monitoring for impact to business outcomes, model monitoring (ex. Data/Concept Drift, Latency), and system monitoring

    • Built in feedback loops for model performance and retraining activities.

    • Building flexible pipelines supporting advanced deployment scenarios such as A/B Testing, Inference Pipeline and Multi-Model Endpoints

    • Automating the creation of APIs for consumption

    • Implementing governed self-service environments for experimentation and development

Resources/Labs:

Solutions:

  • AWS MLOps Framework Solution: This is a one-click to deploy solution published in the AWS Solutions Library that creates a pre-configured pipeline.

Labs/Workshops:

Optimized

The Optimized stage is where we see customers adopting practices and mechanisms that allow for scaling machine learning projects across teams/organizations. At this stage, the MLDC becomes increasingly reproducible.

04-Optimized

Labs/Workshops:

COMING SOON


License

This library is licensed under the Apache 2.0 License.