/anomalearn

A modular and extensible end-to-end library for time series anomaly detection

Primary LanguagePythonEuropean Union Public License 1.2EUPL-1.2

anomalearn: time series anomaly detection library

Group Badges
PyPI PyPI PyPI - Status PyPI - Format PyPI - Python Version PyPI - License
Repository Maintenance
Code linting: pycodestyle linting: flake8 linting: pylint Imports: isort
Docstrings NumpyDoc

Status

The current version of the library (first version) is a pre-release because other content is planned to be added, i.e., the library is currently on development. However, we feel that people can start to use it and contribute to it. Please refer to the documentation for contribution and use.

What is it?

anomalearn is a Python package that provides modular and extensible functionalities for developing anomaly detection methods for time series data, reading publicly available time series anomaly detection datasets, creating the loading of data for experiments, and dataset evaluation functions. Additionally, anomalearn development's plans include the implementation of several state-of-the-art and historical anomaly detection methods, and the implementation of objects to automate the training process of methods. See Discussion and development section for more details.

Documentation

Every functionality in anomalearn is documented. The official documentation is hosted at https://marcopetri98.github.io/anomalearn/index.html.

Main features

Here you find a list of the features offered by anomalearn:

  • Implementation of state-of-the-art and historical anomaly detection methods for time series. The bare models are located in anomalearn.algorithms.models. Where bare models mean the model without the preprocessing or postprocessing operations.
  • Implementation of data readers of commonly used publicly accessible time series anomaly detection datasets. Data readers are all located in the package anomalearn.reader or in anomalearn.reader.time_series. All data readers return a pandas DataFrame.
  • Implementation of some data analysis functions, such as simplicity scoring functions, stationarity tests and time series decomposition functions. These functions are all located in anomalearn.analysis.
  • Implementation of helpers for creating experiments. Currently, only the helper for data loading has been implemented capable of taking data readers and returning all or a subset of series with a default or specific split. The experiment helpers are all located in anomalearn.applications.

Installation

The source code is available at anomalearn github repo.

Currently, the library is shipped only to the Python Package Index (PyPI).

# install from PyPI
pip install anomalearn --pre

Installation from source

Firstly, download or clone the repository and place it in any location on your computer. We will call REPO_PATH. Open the terminal and navigate to the folder:

cd REPO_PATH

Secondly, install the repository using pip:

pip install .

Dependencies

This repository is strongly based on other existing high-quality Python packages for machine learning and for general programming:

  • Numpy: adds support for efficient array operations.
  • Scipy: adds support for scientific computing.
  • Numba: adds a Just In Time compiler for functions that have to be efficient and leaves the package a pure Python package.
  • Pandas: adds support for working with data structures.
  • Scikit-learn: adds support for model development.
  • Scikit-optimize: adds support for searching hyper-parameters of models.
  • Statsmodels: adds support for statistical tests and models.
  • Matplotlib: adds supports for plotting.

Getting help

For the moment, the suggested way to get help is by posting questions to StackOverflow. Then, until the community will grow bigger, consider sending the URL of the questions to the author via email.

Background

This work started with Marco Petri's thesis work. The work initially aimed to develop new anomaly detection methods for time series to reach new state-of-the-art performances. However, given the scarcity of tools specifically aimed for time series anomaly detection, the thesis developed anomalearn and a way to evaluate the simplicity of a dataset. The very first version of the library (v0.0.2a1) is the one presented and described on the thesis. From that point on, the library will receive updates outside the sole scope of the thesis.

Discussion and development

Currently, the development of the first stable version of anomalearn is ongoing. If you want to use it, you can help us in testing the functionalities by providing feedback on the clarity of the documentation, the naming of functions, ease of use, and in proposing new functionalities to implement.

In the future, once the first stable version will be published, a structured and well documented on how to contribute to the library will be written. For the moment, all the discussions related to the development, requests and proposals should be places in the GitHub discussion page.

Contributing to code

Firstly, download or clone the repository and place it in any location on your computer. We will call REPO_PATH. Open the terminal and navigate to the folder:

cd REPO_PATH

The library uses poetry for managing dependencies, building, and publishing. Therefore, it is strongly recommended to carefully read its docs to be able to contribute and install it from source. Be careful, the installed version of poetry must be at least 1.4.1.

poetry init

Now, poetry will recognize the project. You can install the library and its dependencies by using the poetry lock file such that every contributor will use the exact same versions of packages:

# this command will install the library using the lock file
poetry install

Now, you can add functionalities to the library. To ask for changes to be merged, create a pull request. However, it is strongly suggested to ask if a feature can be implemented in anomalearn such that it does not violate any design choice.

Citation

If you find this library useful, please cite the master thesis from which it has been created.

@mastersthesis{Petri2023Anomalearn,
  author = {Petri, Marco},
  school = {Politecnico di Milano},
  title  = {Anomalearn: a modular and extensible library for the development of time series anomaly detection models},
  year   = {2023},
  month  = {may},
  type   = {mathesis},
  doi    = {10.13140/RG.2.2.21679.10406},
  url    = {http://dx.doi.org/10.13140/RG.2.2.21679.10406}
}