/tsfuse

Python package for automatically constructing features from multiple time series

Primary LanguagePythonOtherNOASSERTION

TSFuse

Python package for automatically constructing features from multiple time series

PyPI tests


Installation

Install the latest release using pip:

pip install tsfuse

Quickstart

The example below shows the basic usage of TSFuse.

Data format

The input of TSFuse is a dataset where each instance is a window that consists of multiple time series and a label.

Time series

Time series are represented using a dictionary where each entry represents a univariate or multivariate time series. As an example, let's create a dictionary with two univariate time series:

from pandas import DataFrame
from tsfuse.data import Collection
X = {
    "x1": Collection(DataFrame({
        "id":   [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3],
        "time": [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2],
        "data": [1, 2, 3, 1, 2, 3, 3, 2, 1, 3, 2, 1],
    })),
    "x2": Collection(DataFrame({
        "id":   [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3],
        "time": [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2],
        "data": [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3],
    })),
}

The two univariate time series are named x1 and x2 and each series is represented as a Collection object. Each Collection is initialized with a DataFrame that has three columns:

  • id which is the identifier of each instance, i.e., each window,
  • time which contains the time stamps,
  • data contains the time series data itself.

For multivariate time series data, there can be multiple columns similar to the data column. For example, the data of a tri-axial accelerometer would have three columns x, y, z instead of data as it simultaneously measures the x, y, z acceleration.

Labels

There should be one target value for each window, so we create a Series where the index contains all unique id values of the time series data and the data consists of the labels:

from pandas import Series
y = Series(index=[0, 1, 2, 3], data=[0, 0, 1, 1])

Feature construction

To construct features, TSFuse provides a construct function which takes time series data X and target data y as input, and returns a DataFrame where each column corresponds to a feature. In addition, this function can return a computation graph which contains all transformation steps required to compute the features for new data:

from tsfuse import construct
features, graph = construct(X, y, return_graph=True)

To apply this computation graph to new data, simply call transform with a time series dictionary X as input:

features = graph.transform(X)

Documentation

The documentation is available on https://arnedb.github.io/tsfuse/

Citing TSFuse

If you use TSFuse for a scientific publication, please consider citing this paper:

De Brabandere, A., Op De Beéck, T., Hendrickx, K., Meert, W., & Davis, J. TSFuse: automated feature construction for multiple time series data. Machine Learning (2022)

@article{tsfuse,
    author  = {De Brabandere, Arne
               and Op De Be{\'e}ck, Tim
               and Hendrickx, Kilian
               and Meert, Wannes
               and Davis, Jesse},
    title   = {TSFuse: automated feature construction for multiple time series data},
    journal = {Machine Learning},
    year    = {2022},
    doi     = {10.1007/s10994-021-06096-2},
    url     = {https://doi.org/10.1007/s10994-021-06096-2}
}