/Smart-Manufacturing-AD

Repository linked to "Anomaly detection in Smart-manufacturing era: A review"

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

HELLO-WORLD

Smart Manufacturing Anomaly Detection

This is the code and dataset repository for Smart-Manufacturing (SM) Anomaly Detection (AD) that supports a paper in preparation:

"Anomaly detection in Smart-manufacturing era: A review"

Faults and unscheduled stops resulting from machine breakdowns in manufacturing processes have a substantial impact on both global direct and indirect production costs, as well as compliance with established delivery deadlines. With the continuous expansion of smart manufacturing environments, where shared data are readily available in real time, the implementation of anomaly detection strategies has become increasingly necessary. However, this task has also become more complex due to the multitude of diverse scenarios and methods that need to be considered. The paper presents a comprehensive state-of-the-art review, supported by experiments conducted on a repository of real manufacturing datasets included here. We provide a practical classification framework specifically tailored to smart manufacturing environments, which describes recent and successful methods and algorithms that have been applied to real scenarios. Furthermore, we introduce an experimental evaluation by executing state-of-the-art anomaly detection algorithms, aimed at gaining practical insights into the most suitable methods for various manufacturing environments.

The Team

SmartManuAD is an effort of the Department of Statistics, Informatics and Mathematics of the Public University of Navarra (UPNA), in a collaborative work between Miguel Pagola, PhD professor in Computer Science and Artificial Intelligence at the UPNA, and Iñaki Elía, PhD student in Artificial Intelligence as well as Robotics & Automation Engineer for different Aerospace companies.

Smart Manufacturing concept

SM is a specific application of the Industrial Internet of Things (IIoT) that utilizes embedded sensors to collect operational data and integrate it with physical and digital processes within factories. Through the analysis of data streams across entire factories or multiple facilities, manufacturing engineers and data analysts can, among other purposes, identify faults or anomalies in different processes, diagnose their potential causes, assess the health of systems or prognose the future state of health based on their current condition.

As shown in the following image, we established 5 categories related to SM concepts and 6 categories associated with common manufacturing processes or systems. Our intention was to map publications and datasets to a reformulated classification that is more focused on the emerging SM scenarios.

HELLO-WORLD

Example of Evaluated Results

Our research evaluates 16 state-of-the-art AD algorithms on 29 publicly available real-world datasets in the field of Smart-Manufacturing. These datasets were selected from a pool of 100 publicly available ones, covering various industrial domains, such as additive and subtractive manufacturing, machinery and equipment, automotive, aerospace, rail, energy, logistics, chemicals, electronics, semiconductors, and multi-stage mass manufacturing, among others. The 29 chosen datasets contain real anomalies, which were extrapolated from the information provided by the sources, processed, and curated to create balanced datasets for the analysis.

As an example of the results obtained, see the following figure that shows the AUC-ROC and AUC-PR metrics for the Motors & Drivetrains category and for the 16 selected algorithms.

HELLO-WORLD

Anomaly detection algorithms and packages

We use important anomaly detection libraries for our study, including anomaly detection for tabular (PyOD) and time-series data (TODS). We also use ADBench as the basis for our work with the intention of extending the benchmark of AD algorithms to the SM domain. Please refer to the links above for more information.

Code and Utilities

The code used from ADBench is written in Python 3 (see related dependencies). We simply tuned a notebook ("SmartManuAD_Test_Notebook.ipynb") to help with the specifics of Smart-Manufacturing environments.

Dataset repository

You can find below the list of 29 publicly SM datasets used in our research. They include real anomalies. In the datasets folder you can find the expanded list of 100 collected datasets, though most of them do not include real anomalies, but we injected 10% synthetic ones. Both lists include links to the original dataset sources so that you can get the original metadata for the datasets.

The datasets are processed and converted to the Numpy .npz file format, which is a compressed archive of files. This also allows you to work with some of the utilities and code introduced in ADBench.

ID Transformed Name Link to source Description Original_Format Economic Activity Industrial Sector SM_Category SM_Subcategory Labels Data_Type # Processed Instances # Processed Features # Anomalies % Anomalies
5 cfrp Anonymous CFRP (Carbon Fiber Reinforcement Polymer): Automatic Fiberplacement fabrication priocess for aerostructures. CSV Aerospace CFRP aerostructures manufacturing Cutting & Additive Manufacturing Additive Manufacturing Real Time Series Multivariate 52.268 49 262 0,50%
14 PlantMonitoring link Production Plant Data for Condition Monitoring CSV Machinery & Equipment Plant generic fabrication Syst. Integration & Multistage manufacturing Multistage manufacturing Real Time Series Multivariate 18.429 22 1.394 7,56%
15 Multistage link Multi-stage continuous-flow manufacturing process CSV Machinery & Equipment Machining manufacturing Syst. Integration & Multistage manufacturing Multistage manufacturing Real Time Series Multivariate 14.088 41 13.050 92,63%
18 Robotfail link Robot Execution Failures Data Set This dataset contains force and torque measurements on a robot after failure detection. Each failure is characterized by 15 force/torque samples collected at regular time intervals. CSV Machinery & Equipment Robotics Robotics & AGVs Robots Real Tabular Multivariate 164 89 17 10,37%
21 Engine link Public D3M datasets: Engine (Simple engine data): CSV Automotive components Automotive Motors Motors & Drivetrains Motors Real Time Series Multivariate 383 6 126 32,90%
31 Milling link NASA - Mill Data Set: MAT Machinery & Equipment Machining manufacturing Cutting & Additive Manufacturing CNC Machining Real Time Series Multivariate 117.000 6 27.000 23,08%
35 IMS link IMS bearing defect data provided by the University of Cincinnati The vibration data were obtained by attaching an accelerometer to the four bearings connected to the shaft. Each dataset comprised three test datasets, each of which was the test-tofailure experimental data of the bearing. The data were recorded at intervals of 1 s for 10 min at a sampling rate of 20 kHz. ASCII Machinery & Equipment Machinery & Equipment Motors & Drivetrains Bearings Real Time Series Multivariate 29.154.880 8 6.654.880 22,83%
36 PHM link PRONOSTIA Bearing Dataset The PRONOSTIA (also called FEMTO) bearing dataset consists of 17 accelerated run-to-failures on a small bearing test rig. Both acceleration and temperature data was collected for each experiment. The dataset was used in the 2012 IEEE Prognostic Challenge. The dataset is from FEMTO-ST Institute in France. CSV Machinery & Equipment Machinery & Equipment Motors & Drivetrains Bearings Real Time Series Multivariate 7.175.680 2 11.750 0,16%
42 Motorcondition1 link Condition Monitoring Dataset (AI4I 2021) - HTW Berlin Measured Time-Series data for 8 different operating conditions. CSV Machinery & Equipment Machinery & Equipment Motors & Drivetrains Motors Real Time Series Multivariate 24.000 3 6.000 25,00%
43 Motorcondition2 link Condition Monitoring Dataset (AI4I 2021) - HTW Berlin Measured frequency data for 8 different operating conditions. CSV Machinery & Equipment Machinery & Equipment Motors & Drivetrains Motors Real Time Series Multivariate 2.000 169 500 25,00%
47 FordB link FordB: This data was originally used in a competition in the IEEE World Congress on Computational Intelligence, 2008. The classification problem is to diagnose whether a certain symptom exists or does not exist in an automotive subsystem. TXT Automotive components Automotive Motors Motors & Drivetrains Motors Real Time Series Multivariate 1.818.000 1 930.000 51,16%
54 CNCMachining link CNC Machining Data: The dataset provided is a collection of real-world industrial vibration data collected from a brownfield CNC milling machine. The acceleration has been measured using a tri-axial accelerometer (Bosch CISS Sensor) mounted inside the machine. The X- Y- and Z-axes of the accelerometer have been recorded using a sampling rate equal to 2 kHz. H5 Machinery & Equipment Machining manufacturing Cutting & Additive Manufacturing CNC Machining Real Time Series Multivariate 99.399 3 38.983 39,22%
55 Boschline link Bosch Production Line Performance: A good chocolate soufflé is decadent, delicious, and delicate. But, it's a challenge to prepare. When you pull a disappointingly deflated dessert out of the oven, you instinctively retrace your steps to identify at what point you went wrong. CSV Machinery & Equipment Chocolate packaging line Syst. Integration & Multistage manufacturing Multistage manufacturing Real Time Series Multivariate 1.183.747 158 6.879 0,58%
65 MachineryFault link link Machinery Fault Dataset - Induction Motor Faults Database: The used SpectraQuest Inc. Alignment/Balance Vibration Trainer (ABVT) Machinery Fault Simulator (MFS). CSV Automotive components Induction Motors Motors & Drivetrains Bearings Real Time Series Multivariate 2.250.000 8 1.250.000 55,56%
66 Cuttingblade link One Year Industrial Component Degradation - Degration of a cutting blade: This dataset contains the machine data of a degrading component recorded over the duration of 12 month total. It was initiated in the European research and innovation project IMPROVE. CSV Food and beverage packaging Beverage packaging line Syst. Integration & Multistage manufacturing Multistage manufacturing Real Time Series Multivariate 12.288 7 6.144 50,00%
71 Multimachine link Multiple Machine sensors data CSV Machinery & Equipment Machining manufacturing Syst. Integration & Multistage manufacturing Multistage manufacturing Real Time Series Multivariate 220.320 50 14.484 6,57%
72 CNCturning link CNC turning: roughness, forces and tool wear: Made in the COMPETENCE CENTER IN MANUFACTURING (CCM), a laboratory of the AERONAUTICS INSTITUTE OF TECHNOLOGY (ITA). CSV Machinery & Equipment Machining manufacturing Cutting & Additive Manufacturing CNC Machining Real Tabular Multivariate 612 20 96 15,69%
76 UCIAccelerometer link Dataser de Acelerómetro - UCI Datasets Accelerometer data from vibrations of a cooler fan with weights on its blades. It can be used for predictions, classification and other tasks that require vibration analysis, especially in engines. CSV Machinery & Equipment Electric Motors Motors & Drivetrains Motors Real Time Series Multivariate 153.000 3 102.000 66,67%
80 UCIRobotExectFail link Robot Execution Failures Data Set - UCI Datasets This dataset contains force and torque measurements on a robot after failure detection. Each failure is characterized by 15 force/torque samples collected at regular time intervals. TXT Machinery & Equipment Robotics Robotics & AGVs Robots Real Time Series Multivariate 458 90 291 63,54%
84 VersatileProduction link Versatile Production: Popcorn production process data with multiple process steps. CSV Food and beverage packaging Popcorn packaging line Syst. Integration & Multistage manufacturing Multistage manufacturing Real Tabular Multivariate 10.529 73 8 0,08%
86 HighStorageSystem link High Storage System Anomaly Detection: Storage test rig process data for anomaly detection. CSV Machinery & Equipment Plant generic fabrication Syst. Integration & Multistage manufacturing Multistage manufacturing Real Time Series Multivariate 49.552 18 5.670 11,44%
88 GenesisPickPlace link Genesis Pick-and-Place Demonstrator: Material sorting test rig process data for anomaly detection. CSV Machinery & Equipment Manipulador Pick & Place Robotics & AGVs Robots Real Tabular Multivariate 16.220 18 50 0,31%
91 PlantFaultDetection link Plant Fault Detection - PHM 2015: Anonymized process data for plant fault detection. CSV Machinery & Equipment Plant generic fabrication Syst. Integration & Multistage manufacturing Multistage manufacturing Real Tabular Multivariate 672.530 7 201.854 30,01%
94 ROSIndustrialArmAnomaly link ROS Anomaly Detector package - Anomaly Detection for Industrial Arm Applications: The ROS Anomaly Detector Module (ADM) is designed to execute alongside industrial robotic arm tasks to detect unintended deviations at the application level. The ADM utilizes a learning based technique to achieve this. The process has been made efficient by building the ADM as a ROS package that aptly fits in the ROS ecosystem. CSV Machinery & Equipment Robotics Robotics & AGVs Robots Real Tabular Multivariate 20.221 6 4.692 23,20%
95 CWRUBearing link CWRU Bearing Data: Bearing test rig accelerometer data for fault detection. MAT Machinery & Equipment Bearings Motors & Drivetrains Bearings Real Time Series Multivariate 1.210.422 2 242.616 20,04%
97 MFPT link Condition Based Maintenance Fault Database for Testing of Diagnostic and Prognostics Algorithms MAT Machinery & Equipment Oil Pumps Motors & Drivetrains Bearings Real Time Series Univariate 1.318.356 2 146.468 11,11%
98 DieselEngineFaults link Diesel Engine Faults Features: Fault detection based on pressure curves and vibration. MAT Automotive components Automotive Motors Motors & Drivetrains Motors Real Time Series Univariate 3.500 97 3.250 92,86%
99 TurningChatterDiagnosis link Turning Dataset for Chatter Diagnosis Using Machine Learning: Sensory data of a turning test rig and varying strengths of chatter. MAT Machinery & Equipment Machining manufacturing Cutting & Additive Manufacturing CNC Machining Real Time Series Multivariate 6.400.000 5 3.200.000 50,00%