__ __ __ _____ __ _____ ___
\ \ / / / _| | __ \/_ | ___ | __ \__ \
\ \ /\ / /_ _| |_ ___ _ __ | | | || | ( _ ) | | | | ) |
\ \/ \/ / _` | _/ _ \ '__| | | | || | / _ \/\ | | | |/ /
\ /\ / (_| | || __/ | | |__| || | | (_> < | |__| / /_
\/ \/ \__,_|_| \___|_| |_____/ |_| \___/\/ |_____/____|
_____ _ _
| __ \ | | | |
| | | | __ _| |_ __ _ ___ ___| |_ ___
| | | |/ _` | __/ _` / __|/ _ \ __/ __|
| |__| | (_| | || (_| \__ \ __/ |_\__ \
|_____/ \__,_|\__\__,_|___/\___|\__|___/
This is the README of the Wafer D1 and Wafer D2 dataset. If you use these datasets, please consider citing the following paper:
Furnari G, Vattiato F, Allegra D, Milotta FLM, Orofino A, Rizzo R, De Palo RA, Stanco F. An Ensembled Anomaly Detector for Wafer Fault Detection. Sensors. 2021; 21(16):5465. https://doi.org/10.3390/s21165465
Datasets concern semiconductors industry, both datasets contain timeseries made by a variable number of time samples. Both datasets have 5 reference columns: MaterialID, StepID, duration_ms, target and is_test. The samples are grouped by a MaterialID which repesents the production lot. The production process is divided into steps. The number of steps is different between the two datasets. There are also mandatory and optional step.
- MaterialID: represents the production lot
- StepID: represents the step of the production
- duration_ms: represents the time elapsed (normalized) from the first time sample (in mandatory step) which has duration_ms 0, the last time sample (in mandatory step) has duration_ms equal to 1. Time samples in optional step have duration_ms greater than 1 or lesser than 0.
- Target: is a boolean value that represent if the MaterialID is abnormal or not, 1 for abnormal MaterialID, 0 for normal MaterialID.
- is_test: is a boolean value that represent if the time sample has been used as training or test in our work. 1 means that the time sample has been used as test, 0 means that the time sample has been used for the training phase.
Wafer D1 dataset has 7 steps, 5 are mandatory with IDs 2, 4, 5, 6 and 7 while 2 are optional with ID -1 and -2
Wafer D1 has 5105 MaterialID
Wafer D2 dataset has 2 mandatory steps and no optional steps.
Wafer D2 has 1157 MaterialID
Both dataset have features columns, the number of the features is different betwen the two dataset All the features have been normalized with a z-scaler
Wafer D1 has 15 features columns
Wafer D2 has 20 features columns
Wafer D1 counts 602108 rows and 20 columns
Wafer D2 counts 126795 rows and 25 columns