A Synthetic benchmark for blind anomaly detectors for industrial time series
This is a synthetic dataset that might be used to challenge Blind anomaly detector for time series
a very common situation is investigated where a controlled system is involved which is based on the set of nominal parameters of the system. In this case, the very closed-loop nature implies that the feedback control could hide some of the consequences of the parametric changes representing anomalies. On the other hand, the variability of contexts materializes in the values of the set-point changes that can be applied and fed to the control feedback algorithm. A schematic view of the system used to build the dataset bencmark is shown hereafter:
Fig. 1: schematic view of the ETC system used to build the dataset
The equations of the electronic throttle controlled system are
where
The system is controlled using a backstepping feedback desgin the details of which are ommitted here since they lie out of the scope of the anomalies detection topics.
The benchmark consists in detecting changes impacting the following three parameters:
The raw version of the time series is shown in the Figure 2. below:
Fig. 2: Raw version of the time series incuded in the test dataframe.
The dataset consists of four csv pandas dataframes:
df_train.csv
: The Dataframe of features for trainingdf_train_labels.csv
: The Dataframe of labels for trainingdf_test.csv
: The Dataframe of features for test (This file is to big to be contained in Github, use the following link to download itdf_test_labels.csv
: The Dataframe of labels for test
In order to read the dataframe, use the following pandas command
import pandas as pd
df_train = pd.read_csv('df_train.csv', index_col=0)
The following images show the columns in the df_train
and df_test_label
dataframes:
df_train
=============
df_test_label
=============
Please note that the training data lies in the first block of the
df_test
dataset. If you want to draw statistic, it is important to know that the first 6-th part of the test data is simply the training data. It is therefore expected that you get nice results on this part of the test data.
Use the training dataset df_train
in order to fit your anomalies detector.
Use the test dataset df_test
to predict the presence or not of anomalies in dataset.
Compare your prediction to the column label
of the dataframe df_test_labels
. Note that 0
represent normal data while 1
represent anomalous data.
The prediction can be performed over a moving window spanning the time series or using point-wise prediction.
The nominal benchmark involves only the columns x1
and u
of the dataframe, but you might feel free to use less or more columns.
The other columns in the df_test_labels
are provided as extra columns that explains the origin of the anomalies.
If interested in having more rich set of anomalies values, please contact me.