/pmx_data

Documentation and code for predictive maintenance data and assess scripts.

Primary LanguagePythonMIT LicenseMIT

Predictive Maintence Metadata Repository

Predictive Maintenance poses a number of challenges for machine learning. The general types of machine learning problems encountered in predictive maintenance are:

  1. Time to failure prediction
  2. Anomaly detection
  3. Clustering
  4. Fault detection and root cause analysis

The purpose of this repository is to help researchers start working on predictive maintenance quickly. It provides an overview of relevant predictive maintenance data and 'quick start' scripts for researchers. This is a metadata repository, so it does not contain the data itself--only information about the data, and scripts for downloading and working with it.

Table of Contents

  1. Video Walkthrough
  2. Overview
  3. List of Datasets
  4. Downloading a Dataset
  5. Adding a Dataset
  6. Additional Resources
  7. Wishlist

Video Walkthrough

A short 5-minute video walkthrough is available here

Overview

equipment-type

pmx-tasks

ml-algorithms

List of Datasets

Dataset Description Problems Equipment Type Size (GB) Features, Rows Note
maintnet Maintenance action write-ups for aircraft, automotive, and facility domains Language model, event forecasting Aircraft, automotives, and facility
Autonomous Underwater Vehicle Time series measurements from an underwater vehicle with 5 fault types Time Series Diagnostics / Fault Classification Maritime 0.025 17, 1225 rar data compression - requires additional setup to download (see comments in get_data.sh)
CNC Mill Tool Wear Time series machining data across 18 CNC milling experiments Time Series Fault Detection Tools & Machinery 0.011 47, 18 Data is hosted on kaggle, and requires setup before downloading (see comments in get_data.sh)
Delta Robot Time series sensor data for a robot used in a production line Time Series Anomaly Detection Robotics 0.000348
Diesel Engine Faults Diesel engine data from failure scenarios across 4 operating states. Diagnostics / Fault Classification Engines 0.0061 data is in .mat format.
electrical fault detection Line currents and voltages of electrical system with 4 fault conditions Fault Detection and Classification Electrical Data is hosted on kaggle, and requires setup before downloading (see comments in get_data.sh)
gearbox fault detection Bench test gearbox with 7 induced faults to detect in 3 channels, high frequency timeseries Fault Detection Gearboxes & Mechanisms
gearbox fault diagnosis Vibration measurements from healthy and faulty gearboxes with varying loads and recoring frequencies Fault detection Gearboxes & Mechanisms Data is hosted on kaggle, and requires setup before downloading (see comments in get_data.sh)
hdd data 2013 BlackBlaze hard drive failure data Time to failure prediction Computer HW & IOT
hydraulic sensor system Time series sensor readings on a hydraulic test rig with target condition values Fault Detection and Classification Tools & Machinery
li-ion battery aging Data from tests on 4 Lithium-Ion batteries cycled under random currents Time to failure prediction Batteries
machinery faults datasets Extremely large dataset of simulated time series machinery data with 6 operating states, each of which can have several fault types Fault classification Tools & Machinery
Maintenance of naval propulsion plants Synthetic gas turbine data Time to failure prediction Maritime
nasa milling prognostic dataset Investigating wear on a milling machine under several runs in various operating conditions Fault Detection Tools & Machinery Data is hosted on kaggle, and requires setup before downloading (see comments in get_data.sh)
one year industrial component degradation High-frequency time series data documentating degradation of an industrial component over the course of a year Time to failure prediction Tools & Machinery Data is hosted on kaggle, and requires setup before downloading (see comments in get_data.sh)
plant fault detection Time series measurement from 70 plants with 6 fault types Fault detection Gearboxes & Mechanisms
pmx for aircraft machine and components telemetry time series data and maintenance records for aircraft Time to failure prediction Aircraft Data is hosted on kaggle, and requires setup before downloading (see comments in get_data.sh)
pmx for ga Per-second sensor data from flights of a Cessna 172S preceeding maintenance Time to failure prediction Aircraft Data is hosted on kaggle, and requires setup before downloading (see comments in get_data.sh)
pmx from elevator industry Data on elevator ball bearing wear Time to failure prediction Tools & Machinery
pmx iot sensor Data on failure of heat exchagers on an assembly line Time to failure prediction Tools & Machinery
prediction of downtime duration Predicting downtime duration of car manufacturing assembly lines Time to event prediction Tools & Machinery
predictive maintenance fault classification Sensor data from a drill press under induced fault conditions Fault detection and classification Tools & Machinery
production plant data for condition monitoring Sensor data from 8 run-to-failure experiements on a production line component Time to failure prediction Tools & Machinery Data is hosted on kaggle, and requires setup before downloading (see comments in get_data.sh)
pump sensor Time series sensor readings from a water pump with status Anomaly detection Tools & Machinery Data is hosted on kaggle, and requires setup before downloading (see comments in get_data.sh)
robot execution failures Sensor data from a robot after 5 different types of failures Time to failure prediction Robotics Data is hosted on kaggle, and requires setup before downloading (see comments in get_data.sh)
Air pressure system failures in Scania trucks. Classifying whether truck failure is a result of its air pressure system Tabular Diagnostics / Fault Classification Land Vehicles 0.054 171, 76000
solar power generation Power generation and weather data for 2 solar power plants Fault detection Electrical Data is hosted on kaggle, and requires setup before downloading (see comments in get_data.sh)
telemanom Telemetry data from 2 spacecraft with labeled anomalies in the testing set Supervised Anomaly Detection Robotics
turbofan Multidimensional sensor data from simulated run to failure experiments Time to event prediction Aircraft Data is hosted on kaggle, and requires setup before downloading (see comments in get_data.sh)
Predictive Maintenance for Electrical Wiring Faults A dataset to support automatic optical inspection tool for electrical components using computer vision. Image Fault Detection Electrical 0.8195 1, 300 Data is structured to be ready to used by a yolov5 network.

Downloading a Dataset

  1. Navigate to that dataset's directory in this repository.
  2. Download the get_data.sh script.
  3. Run the get_data.sh script in the location where you would like to download the data.

This currently only works on Linux. Some get_data.sh scripts require additional steps before you can run them, which are described in a comment at the top of the file.

Adding a Dataset

  1. If the dataset is not already hosted online: upload it to a data hosting site (we recommend Mendeley).
  2. Create a fork of this repository.Clone that fork to your local machine.
  3. Make a copy of the sample_dataset folder and place it within the pmx_data directory. This folder contains sample scripts to get you started.
  4. Rename the directory to match the name of your dataset.
  5. Modify the 'get_data.sh' script to download your data and unpack it into a standard csv format within a subfolder called 'datasets'.
  6. Modify the 'info.yaml' file, filling in information about your dataset that will be used to generate a README.
  7. Optional: Write a 'custom_writeup.md' markdown file containing any information about your dataset that is not encapsulated by info.yaml. This will be added to the generated README.
  8. Optional: Write a 'load_data.py' sample script to load the data into a pandas dataframe or any other python object that is easy to work with (this sample script works well in most instances).
  9. Commit your local changes and push them to your fork on GitHub.Create a pull request from your fork into this repository.

Additional Resources

https://data.phmsociety.org/

https://www.nasa.gov/content/prognostics-center-of-excellence-data-set-repository

https://zenodo.org/

UCI repository https://archive.ics.uci.edu/ml/datasets.php

https://www.openml.org/

Wishlist

  • More image datasets, i.e. PmX with microscopy or aerial inspection.
  • Tracking performance of different autoML tools on datasets over time.
  • Support for multiple problem types on same dataset (for instance, fault detection can also be anomaly detection if you remove the target variable).
  • Guide to pros and cons of different data hosting services (mendeley, kaggle, etc) for people who want to upload datasets.
  • Allow for searching for datasets with different attributes or sorting by attribute.
  • Github Pages GUI to make downloading and uploading easier for non-technical users.
  • Support for downloading and unpacking data in Windows and MacOS.
  • Add domain table for data type, problem type, equipment type
  • Standardize train/test splits for future benchmarking when possible
  • More readable display of bibtex citations
  • Fully fill out info.yaml for all datasets