A small synthetic benchmark for blind anomaly detectors for industrial time series
This is a synthetic dataset that might be used to challenge **Blind anomaly detector for time series **
The dynamical system for which the dataset is created is the famous Lorentz attractor which is governed by the following set of Ordinary Differential Equations:
where
Note that green regions corresponds to nominal parameters (normalized values = 1) while the red regions correspond to changes in the values of the parameters to be detected by the anomalies detector.
The dataset consists of four csv pandas dataframes:
df_train.csv
: The Dataframe of features for trainingdf_train_labels.csv
: The Dataframe of labels for trainingdf_test.csv
: The Dataframe of features for testdf_test_labels.csv
: The Dataframe of labels for test
In order to read the dataframe, use the following pandas command
import pandas as pd
df_train = pd.read_csv('df_train.csv', index_col=0)
The following images show the columns in the df_train
and df_test_label
dataframes
df_train
=============
df_test_label
=============
Please note that the training data lies in the first block of the
df_test
dataset. If you want to draw statistic, it is important to know that the first 6-th part of the test data is simply the training data. It is therefore expected that you get nice results on this part of the test data.
Use the training dataset df_train
in order to fit your anomalies detector.
Use the test dataset df_test
to predict the presence or not of anomalies in dataset.
Compare your prediction to the column label
of the dataframe df_test_labels
. Note that 0
represent normal data while 1
represent anomalous data.
The prediction can be performed over a moving window spanning the time series or using point-wise prediction.
The nominal benchmark involves only the columns x1
and x3
of the dataframe, but you might feel free to use less or more columns.
The other columns in the df_test_labels
are provided as extra columns that explains the origin of the anomalies.
If interested in having more rich set of anomalies values, please contact me.