Predictive Maintenance poses a number of challenges for machine learning. The general types of machine learning problems encountered in predictive maintenance are:
- Time to failure prediction
- Anomaly detection
- Clustering
- Fault detection and root cause analysis
The purpose of this repository is to help researchers start working on predictive maintenance quickly. It provides an overview of relevant predictive maintenance data and 'quick start' scripts for researchers. This is a metadata repository, so it does not contain the data itself--only information about the data, and scripts for downloading and working with it.
- Video Walkthrough
- Overview
- List of Datasets
- Downloading a Dataset
- Adding a Dataset
- Additional Resources
- Wishlist
A short 5-minute video walkthrough is available here
Dataset | Description | Problems | Equipment Type | Size (GB) | Features, Rows | Note |
---|---|---|---|---|---|---|
maintnet | Maintenance action write-ups for aircraft, automotive, and facility domains | Language model, event forecasting | Aircraft, automotives, and facility | |||
Autonomous Underwater Vehicle | Time series measurements from an underwater vehicle with 5 fault types | Time Series Diagnostics / Fault Classification | Maritime | 0.025 | 17, 1225 | rar data compression - requires additional setup to download (see comments in get_data.sh) |
CNC Mill Tool Wear | Time series machining data across 18 CNC milling experiments | Time Series Fault Detection | Tools & Machinery | 0.011 | 47, 18 | Data is hosted on kaggle, and requires setup before downloading (see comments in get_data.sh) |
Delta Robot | Time series sensor data for a robot used in a production line | Time Series Anomaly Detection | Robotics | 0.000348 | ||
Diesel Engine Faults | Diesel engine data from failure scenarios across 4 operating states. | Diagnostics / Fault Classification | Engines | 0.0061 | data is in .mat format. | |
electrical fault detection | Line currents and voltages of electrical system with 4 fault conditions | Fault Detection and Classification | Electrical | Data is hosted on kaggle, and requires setup before downloading (see comments in get_data.sh) | ||
gearbox fault detection | Bench test gearbox with 7 induced faults to detect in 3 channels, high frequency timeseries | Fault Detection | Gearboxes & Mechanisms | |||
gearbox fault diagnosis | Vibration measurements from healthy and faulty gearboxes with varying loads and recoring frequencies | Fault detection | Gearboxes & Mechanisms | Data is hosted on kaggle, and requires setup before downloading (see comments in get_data.sh) | ||
hdd data | 2013 BlackBlaze hard drive failure data | Time to failure prediction | Computer HW & IOT | |||
hydraulic sensor system | Time series sensor readings on a hydraulic test rig with target condition values | Fault Detection and Classification | Tools & Machinery | |||
li-ion battery aging | Data from tests on 4 Lithium-Ion batteries cycled under random currents | Time to failure prediction | Batteries | |||
machinery faults datasets | Extremely large dataset of simulated time series machinery data with 6 operating states, each of which can have several fault types | Fault classification | Tools & Machinery | |||
Maintenance of naval propulsion plants | Synthetic gas turbine data | Time to failure prediction | Maritime | |||
nasa milling prognostic dataset | Investigating wear on a milling machine under several runs in various operating conditions | Fault Detection | Tools & Machinery | Data is hosted on kaggle, and requires setup before downloading (see comments in get_data.sh) | ||
one year industrial component degradation | High-frequency time series data documentating degradation of an industrial component over the course of a year | Time to failure prediction | Tools & Machinery | Data is hosted on kaggle, and requires setup before downloading (see comments in get_data.sh) | ||
plant fault detection | Time series measurement from 70 plants with 6 fault types | Fault detection | Gearboxes & Mechanisms | |||
pmx for aircraft machine and components | telemetry time series data and maintenance records for aircraft | Time to failure prediction | Aircraft | Data is hosted on kaggle, and requires setup before downloading (see comments in get_data.sh) | ||
pmx for ga | Per-second sensor data from flights of a Cessna 172S preceeding maintenance | Time to failure prediction | Aircraft | Data is hosted on kaggle, and requires setup before downloading (see comments in get_data.sh) | ||
pmx from elevator industry | Data on elevator ball bearing wear | Time to failure prediction | Tools & Machinery | |||
pmx iot sensor | Data on failure of heat exchagers on an assembly line | Time to failure prediction | Tools & Machinery | |||
prediction of downtime duration | Predicting downtime duration of car manufacturing assembly lines | Time to event prediction | Tools & Machinery | |||
predictive maintenance fault classification | Sensor data from a drill press under induced fault conditions | Fault detection and classification | Tools & Machinery | |||
production plant data for condition monitoring | Sensor data from 8 run-to-failure experiements on a production line component | Time to failure prediction | Tools & Machinery | Data is hosted on kaggle, and requires setup before downloading (see comments in get_data.sh) | ||
pump sensor | Time series sensor readings from a water pump with status | Anomaly detection | Tools & Machinery | Data is hosted on kaggle, and requires setup before downloading (see comments in get_data.sh) | ||
robot execution failures | Sensor data from a robot after 5 different types of failures | Time to failure prediction | Robotics | Data is hosted on kaggle, and requires setup before downloading (see comments in get_data.sh) | ||
Air pressure system failures in Scania trucks. | Classifying whether truck failure is a result of its air pressure system | Tabular Diagnostics / Fault Classification | Land Vehicles | 0.054 | 171, 76000 | |
solar power generation | Power generation and weather data for 2 solar power plants | Fault detection | Electrical | Data is hosted on kaggle, and requires setup before downloading (see comments in get_data.sh) | ||
telemanom | Telemetry data from 2 spacecraft with labeled anomalies in the testing set | Supervised Anomaly Detection | Robotics | |||
turbofan | Multidimensional sensor data from simulated run to failure experiments | Time to event prediction | Aircraft | Data is hosted on kaggle, and requires setup before downloading (see comments in get_data.sh) | ||
Predictive Maintenance for Electrical Wiring Faults | A dataset to support automatic optical inspection tool for electrical components using computer vision. | Image Fault Detection | Electrical | 0.8195 | 1, 300 | Data is structured to be ready to used by a yolov5 network. |
- Navigate to that dataset's directory in this repository.
- Download the get_data.sh script.
- Run the get_data.sh script in the location where you would like to download the data.
This currently only works on Linux. Some get_data.sh scripts require additional steps before you can run them, which are described in a comment at the top of the file.
- If the dataset is not already hosted online: upload it to a data hosting site (we recommend Mendeley).
- Create a fork of this repository.Clone that fork to your local machine.
- Make a copy of the sample_dataset folder and place it within the pmx_data directory. This folder contains sample scripts to get you started.
- Rename the directory to match the name of your dataset.
- Modify the 'get_data.sh' script to download your data and unpack it into a standard csv format within a subfolder called 'datasets'.
- Modify the 'info.yaml' file, filling in information about your dataset that will be used to generate a README.
- Optional: Write a 'custom_writeup.md' markdown file containing any information about your dataset that is not encapsulated by info.yaml. This will be added to the generated README.
- Optional: Write a 'load_data.py' sample script to load the data into a pandas dataframe or any other python object that is easy to work with (this sample script works well in most instances).
- Commit your local changes and push them to your fork on GitHub.Create a pull request from your fork into this repository.
https://www.nasa.gov/content/prognostics-center-of-excellence-data-set-repository
UCI repository https://archive.ics.uci.edu/ml/datasets.php
- More image datasets, i.e. PmX with microscopy or aerial inspection.
- Tracking performance of different autoML tools on datasets over time.
- Support for multiple problem types on same dataset (for instance, fault detection can also be anomaly detection if you remove the target variable).
- Guide to pros and cons of different data hosting services (mendeley, kaggle, etc) for people who want to upload datasets.
- Allow for searching for datasets with different attributes or sorting by attribute.
- Github Pages GUI to make downloading and uploading easier for non-technical users.
- Support for downloading and unpacking data in Windows and MacOS.
- Add domain table for data type, problem type, equipment type
- Standardize train/test splits for future benchmarking when possible
- More readable display of bibtex citations
- Fully fill out info.yaml for all datasets