sktime
sktime is a Python machine learning toolbox for time series with a unified interface for multiple learning tasks. We currently support:
- Forecasting,
- Time series classification,
- Time series regression.
sktime provides dedicated time series algorithms and scikit-learn compatible tools for building, tuning, and evaluating composite models.
For deep learning methods, see our companion package: sktime-dl.
Installation
The package is available via PyPI using:
pip install sktime
The package is actively being developed and some features may not be stable yet.
Development Version
To install the development version, please see our advanced installation instructions.
Quickstart
Forecasting
import numpy as np
from sktime.datasets import load_airline
from sktime.forecasting.theta import ThetaForecaster
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.performance_metrics.forecasting import smape_loss
y = load_airline()
y_train, y_test = temporal_train_test_split(y)
fh = np.arange(1, len(y_test) + 1) # forecasting horizon
forecaster = ThetaForecaster(sp=12) # monthly seasonal periodicity
forecaster.fit(y_train)
y_pred = forecaster.predict(fh)
smape_loss(y_test, y_pred)
>>> 0.1722386848882188
For more, check out the forecasting tutorial.
Time Series Classification
from sktime.datasets import load_arrow_head
from sktime.classification.compose import TimeSeriesForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
X, y = load_arrow_head(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
classifier = TimeSeriesForestClassifier()
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
accuracy_score(y_test, y_pred)
>>> 0.7924528301886793
For more, check out the time series classification tutorial.
Documentation
- Watch our online tutorial on Machine Learning with Time Series at the PyData Amsterdam 2020: [video], [repo]
- Check out our example notebooks - you can run them on Binder without having to install anything!
- Read our detailed API reference.
API Overview
sktime is a unified toolbox for machine learning with time series. Time series give rise to multiple learning tasks (e.g. forecasting and time series classification). The goal of sktime is to provide all the necessary tools to solve these tasks, including dedicated time series algorithms as well as tools for building, tuning and evaluating composite models.
Many of these tasks are related. An algorithm that can solve one of them can often be re-used to help solve another one, an idea called reduction. sktime's unified interface allows to easily adapt an algorithm for one task to another.
For example, to use a regression algorithm to solve a forecasting task, we can simply write:
import numpy as np
from sktime.datasets import load_airline
from sktime.forecasting.compose import ReducedRegressionForecaster
from sklearn.ensemble import RandomForestRegressor
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.performance_metrics.forecasting import smape_loss
y = load_airline()
y_train, y_test = temporal_train_test_split(y)
fh = np.arange(1, len(y_test) + 1) # forecasting horizon
regressor = RandomForestRegressor()
forecaster = ReducedRegressionForecaster(regressor, window_length=12)
forecaster.fit(y_train)
y_pred = forecaster.predict(fh)
smape_loss(y_test, y_pred)
>>> 0.12726230426056875
For more details, check out our paper.
Currently, sktime provides:
- State-of-the-art algorithms for time series classification and regression, ported from the Java-based tsml toolkit, as well as forecasting,
- Transformers, including single-series transformations (e.g. detrending or deseasonalization) and series-as-features transformations (e.g. feature extractors), as well as tools to compose different transformers,
- Pipelining,
- Tuning,
- Ensembling, such as a fully customisable random forest for time-series classification and regression, as well as ensembling for multivariate problems,
For a list of implemented methods, see our estimator overview.
In addition, sktime includes an experimental high-level API that unifies multiple learning tasks, partially inspired by the APIs of mlr and openML.
Development Roadmap
sktime is under active development. We're looking for new contributors, all contributions are welcome!
- Multivariate/panel forecasting based on a modified pysf API,
- Unsupervised learning, including time series clustering,
- Time series annotation, including segmentation and outlier detection,
- Specialised data container for efficient handling of time series/panel data in a modelling workflow and separation of time series meta-data,
- Probabilistic modelling framework for time series, including survival and point process models based on an adapted skpro interface.
For more details, read this issue.
How to contribute
- First check out our guide on how to contribute.
- Chat with us or raise an issue if you get stuck or have questions.
- Please also read our Code of Conduct and Governance document.
For former and current contributors, see our overview.
How to cite sktime
If you use sktime in a scientific publication, we would appreciate citations to the following paper:
Bibtex entry:
@inproceedings{sktime,
author = {L{\"{o}}ning, Markus and Bagnall, Anthony and Ganesh, Sajaysurya and Kazakov, Viktor and Lines, Jason and Kir{\'{a}}ly, Franz J},
booktitle = {Workshop on Systems for ML at NeurIPS 2019},
title = {{sktime: A Unified Interface for Machine Learning with Time Series}},
date = {2019},
}