effective-guide-mlops

End-to-end machine learning pipeline with Sagemaker Python SDK

This repository provides an example end-to-end machine learning pipeline on AWS build using the Sagemaker Python SDK. It leans on other resources (e.g. here and here), however, it provides a unified end-to-end example in a notebook from data processing to deployment of a REST API. This not production ready, but it will give you a good primary intuition how to orchestrate the ml lifecycle on AWS via the Sagemaker SDK.

The main ressource for this guid is the notebook ml_pipeline.ipynb in the folder notebooks. The easiest way to follow along the tutorial would be to launch a notebook instance on AWS Sagemaker and pull the repository into your jupyterlab environment.

1. Data

The Penguin Dataset from Alison Horst is an alternative to the famous iris dataset that can be used for demonstrating various ml tasks. Read more here.

	species	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex	year
1	Adelie	Torgersen	39.1	18.7	181	3750	male	2007
2	Adelie	Torgersen	39.5	17.4	186	3800	female	2007
3	Adelie	Torgersen	40.3	18	195	3250	female	2007

2. Objective

The goal is to train a classifier that predicts the sex/gender of a penguin based on all other variables available.

3. Ressources

Notebooks:

stored in /notebooks
eda.ipynb visual exploration of the data
ml_pipeline.ipynb orchestrates preprocessing of the data, model training and deployment of the model as endpoint

4 Tutorial Wolkthrough

head over to notebooks.ml_pipeline.ipynb and follow the procedure