/symmetrical-maintenance-broccoli

Analysis and machine learning on the AI4I 2020 Predictive Maintenance Dataset Data Set

Primary LanguageJupyter NotebookMIT LicenseMIT

Predictive Maintenance Machine Learning

This is my capstone project in the Machine Learning Engineer and will serve as a demonstration of end-to-end machine learning.

The problem I want to explore is prediction of machine failure from maintenance data. From the UCI Machine Learning Repository I have chosen to work with the AI4I 2020 Predictive Maintenance Dataset Data Set. The exploratory data analysis, feature extraction and modeling will be documented in Jupyter notebooks. All machine learning training will be done on AWS SageMaker

The final analysis and insights will be documented in a report.

  1. Exploratory Data Analysis - Notebook that explore data set and draws plots and some simple summary statistics.
  2. Feature Engineering - Notebook that selects features, upsamples using SMOTE and adjust ranges with a Min/Max scalar.
  3. Linear learner baseline - Notebook trained with AWS SageMaker Linear Learner Algorithm
  4. PyTorch Training Model - Notebook that trains and evaluates simple neural network model.
    1. pytorch_model_def.py - The neural network model.
    2. train_deploy_pytorch_without_dependencies.py - SageMaker scripts for training model and inference

PDF documents describing the machine learning experiment:

Tech Stack

Python, NumPy, Pandas, Matplotlib, Seaborn, Jupyter, PyTorch, AWS SageMaker, Imbalanced

Imbalanced dataset

The data set is highly imbalanced where the feature Machine failure consists of 9661 (0.9661) false values and 339 (0.0339) failures according to the five failure modes.

I choose to use the Imbalanced-learn library that provides tools for dealing with imbalanced classes.

References