Time Series Classification with SEQL

Description

Mr-SEQL is a time series classification software which utilizes multiple symbolic representations of time series.

SAX

SAX is a transformation method to convert numeric vector to a symbolic representation, i.e. a sequence of symbols from a predefined alphabet a. SAX first computes the Piecewise Aggregate Approximation (PAA) of a time series and then transforms this approximation to a symbolic representation.

PAA reduces a time series of length L to a vector of length l (l < L is also the length of the symbolic sequence) by dividing the time series into equal segments. Each segment is then replaced with its mean value.

PAA followed by a discretisation step which replaces each value of the PAA with a corresponding symbol. Symbol is selected from the alphabet based on the interval in which the value falls. There are a intervals, as many as the size of the alphabet. Each interval is associated with a symbol from the alphabet. Assuming that the time series is normal distributed, the intervals are divided under the normal distribution (i.e. N(0,1)) with equal probability.

SFA

SFA is also transforms a time series to a symbolic representation. The core differences between SAX and SFA are the choices of approximation and discretisation techniques. SFA uses a Discreet Fourier Transform (DFT) method to approximate a time series.

More information on SFA can be found here: https://github.com/patrickzib/SFA

SEQL

The original SEQL software and its description can be found here: https://github.com/heerme/seql-sequence-learner

Combination of SEQL and symbolic representation of time series

Single Representation

This is our first attempt in time series classification with SEQL. SEQL learn a linear classification model from the symbolic representation of time series (either SAX or SFA).

Multiple Representations

SEQL can be combined with symbolic representations of multiple resolutions and multiple domains.

Interpretation

As our classifier is linear, the model itself is interpretable. Furthermore, we can visualize the SAX features selected by SEQL in the raw time series domain.

Installation

To compile execute following commands in the src directory:

mkdir -p build
cd build
mkdir -p Release
cd Release
cmake -DCMAKE_BUILD_TYPE=Release ../../
make

How to Use

Convert time series to multiple SAX representations:

./sax_convert -i Coffee_TRAIN -o sax.train
./sax_convert -i Coffee_TEST -o sax.test

Classify with Ensemble SEQL (./saxdir/ will store the output of the program):

./mr_seql -t sax.train -T sax.test -o saxdir

SEQL can also be used for feature selections. The command above also writes to file a list of features selected by SEQL. Following example uses sklearn Logistic Regression for classification with the selected features:

python mf_logreg.py saxdir

The steps to use SFA representation are similar. We provide in the src folder the python script that can work with the Python port of SFA. To combine SFA features and SAX features for classification, simply add both directories to the above command: