/cd0583-diagnose-and-fix

Diagnose and fix problems in a production deployed code.

Primary LanguagePythonMIT LicenseMIT

Diagonsing and Fixing Operational Problems

How to diagnose and fix problems in a production deployed code?

To understand a model's functionality and find its underlying problems, we need to take care of the following things:

  • Periodic training
    • Figure out an optimal retraining strategy.
  • Monitor model performance
  • Analyse
    • Clear visibility of the model helps us and guides the model performance.

Summary

In this section we will diagnose and fix problems in a production deployed code. To that end, we will:

  • Use evidently and mlflow libraries.
  • Deploy a machine learning model in Heroku.
  • Calculate data drift for the model.
  • Use mlflow Tracking for the training experiments indicating data drift.
  • Explore the results using mlflow UI.

Tutorial: Data Drift / mlflow

In this tutorial we will use the Bike Sharing Dataset.

You can use the following code snippet (from train.py) to analyze this data before starting this tutorial.

content = requests.get("https://archive.ics.uci.edu/ml/machine-learning-databases/00275/Bike-Sharing-Dataset.zip").content

with zipfile.ZipFile(io.BytesIO(content)) as arc:
    raw_data = pd.read_csv(arc.open("day.csv"), header=0, sep=',', parse_dates=['dteday'], index_col='dteday')

# observe data structure
raw_data.tail()

Environment setup

Prerequisites

  • Heroku account
  • GitHub account
  • Clone the Github repo
    https://github.com/udacity/cd0583-diagnose-and-fix.git

Dependencies

  • All the dependencies are listed in the requirements.txt file. You can setup a virtual environment using Anaconda and install the required dependencies there.
  • runtime.txt contains the python version that is used for this tutorial.

Steps

Follow the steps below for deploying this model:

  • Ensure that all the dependencies listed in the requirements.txt file are installed.
  • Run the train.py file to log experiments in mlflow
  • View the results in the mlflow webui

How to fix Data Drift issues

  • If there is substantial data drift then you should reweigh samples in the training data, giving more importance to newer patterns.
  • Identify new segments where the model fails, and create a different model for it. Consider using an ensemble of several models for different segments of the data.
  • Change the prediction target. For example, switch from weekly to daily forecast or replace the regression model with classification into categories from "high" to "low."
  • Pick a different model architecture to account for ongoing drift. You can consider incremental or online learning, where the model continuously adapts to new data.
  • Apply domain adaptation strategies. There are a number of approaches to help the model better generalize to a new target domain.

Extra Resources