/awesome-industrial-datasets

A curated collection of public industrial datasets.

OtherNOASSERTION

Awesome Industrial Datasets - A Curated Collection of Public Industrial Datasets.

This repository is maintained by a small team. We greatly appreciate any assistance you can provide. If you're interested in contributing, please refer to our Contribution Guideline.

This repository started as a fork from a deprecated project by MakinaRocks.

Sectors


Datasets

Battery

Chemical

  • Dynamic Gas Mixtures: The data set contains the recordings of 16 chemical sensors exposed to two dynamic gas mixtures at varying concentrations. For each mixture, signals were acquired continuously during 12 hours.

  • Gas Sensor Array Drift: This archive contains 13910 measurements from 16 chemical sensors exposed to 6 different gases at various concentration levels.

  • Gas sensor arrays in open sampling settings Data Set: The dataset contains 18000 time-series recordings from a chemical detection platform at six different locations in a wind tunnel facility in response to ten high-priority chemical gaseous substances.

  • Wine Quality: Two datasets are included, related to red and white "Vinho Verde" wine samples, from the north of Portugal. The goal is to model wine quality based on physicochemical tests.

Control Loop

  • ISDB - International Stiction Data Base: This is an international database of industrial control loops (most of them suffering from stiction) from different fields.

  • Oscillation detection artificial dataset: This dataset provides simulated oscillatory and non-oscillatory time series for classification tasks using Machine Learning.

  • SACAC: The aim of this repository is to provide a test environment for control loop performance monitoring methods.

  • SISO-RAW: Raw data collected in two and a half days from 52 control loops of an oil and gas company. For each loop, the controller output (OP), process variable (PV), setpoint (SP), and the valve stem position (MV) are recorded.

Mechanical

Oil and Gas

  • 3W: Realistic and public dataset with rare undesirable real events in oil wells that can be readily used as a benchmark dataset for development of machine learning techniques related to inherent difficulties of actual data.

Power

  • Appliance Energy: Experimental data used to create regression models of appliances energy use in a low energy building.

  • BLUED dataset: The dataset consists of voltage and current measurements for a single-family residence in the United States, sampled at 12 kHz for a whole week.\

  • Combined Cycle Power Plant: The dataset contains 9568 data points collected from a Combined Cycle Power Plant over 6 years (2006-2011), when the plant was set to work with full load.

  • Eco(Electricity Consumption & Occupancy): The ECO data set is a comprehensive data set for non-intrusive load monitoring and occupancy detection research.

  • GREEND: GREEND is an energy dataset containing power measurements collected from multiple households in Austria and Italy. It provides detailed energy profiles on a per device basis with a sampling rate of 1 Hz.

  • REDD: A Public Data Set for Energy Disaggregation Research: A freely available data set containing detailed power usage information from several homes, which is aimed at furthering research on energy disaggregation (the task of determining the component appliance contributions from an aggregated electricity signal).

  • UK DALE dataset : This dataset records the power demand from five houses. In each house we record both the whole-house mains power demand every six seconds as well as power demand from individual appliances every six seconds. In three of the five houses (houses 1, 2 and 5) we also record the whole-house voltage and current at 16 kHz.

Semicon

Steel

  • Steel Industry Enery Consumption: The data is collected from a smart small-scale steel industry in South Korea.

  • Steel Plates Faults: A dataset of steel plates faults, classified into 7 different types. The goal was to train machine learning for automatic pattern recognition.

Others

  • APS System Failures: The datasets' positive class consists of component failures for a specific component of the APS system. The negative class consists of trucks with failures for components not related to the APS.

  • Hill-Valley: This is NOT a manufacturing dataset, but looks good for testing pattern detection methods.