〰️ hctsa 〰️: highly comparative time-series analysis
hctsa is a software package for running highly comparative time-series analysis using Matlab (full support for versions R2018b or later).
The software provides a code framework that enables the extraction of thousands of time-series features from a time series (or a time-series dataset). It also provides a range of tools for visualizing and analyzing the resulting time-series feature matrix, including:
- Normalizing and clustering the data,
- Producing low-dimensional representations of the data,
- Identifying and interpreting discriminating features between different classes of time series,
- Learning multivariate classification models.
Feel free to email me for help with real-world applications of hctsa 🤓
Acknowledgement 👍
If you use this software, please read and cite these open-access articles:
- B.D. Fulcher and N.S. Jones. hctsa: A computational framework for automated time-series phenotyping using massive feature extraction. Cell Systems 5, 527 (2017).
- B.D. Fulcher, M.A. Little, N.S. Jones. Highly comparative time-series analysis: the empirical structure of time series and their methods. J. Roy. Soc. Interface 10, 83 (2013).
Feedback, as email, github issues or pull requests, is much appreciated.
For commercial use of hctsa, including licensing and consulting, contact Engine Analytics.
Getting Started 😊
Documentation 📖
Comprehensive documentation for hctsa, from getting started through to more advanced analyses is on gitbook.
Downloading the repository ⬇️
For users unfamiliar with git, the current version of the repository can be downloaded by simply clicking the green Code button, and then clicking Download ZIP.
It is recommended to use the repository with git. For this, please make a fork of it, clone it to your local machine, and then set an upstream remote to keep it synchronized with the main repository e.g., using the following code:
git remote add upstream git://github.com/benfulcher/hctsa.git
(make sure that you have generated an ssh key and associated it with your Github account).
You can then update to the latest stable version of the repository by pulling the master branch to your local repository:
git pull upstream master
For analyzing specific datasets, we recommend working outside of the repository so that incremental updates can be pulled from the upstream repository. Details on how to merge the latest version of the repository with the local changes in your fork can be found here.
Related resources
CompEngine 💥
CompEngine is an accompanying web resource for this project. It is a self-organizing database of time-series data that allows users to upload, explore, and compare thousands of diverse types of time-series data. This vast and growing collection of time-series data can also be downloaded. Go have a play, read more about it in our 📙paper, or watch a talk on YouTube.
catch22 2️⃣2️⃣
Is over 7000 just a few too many features for your application? Do you not have access to a Matlab license? catch22 has all of your faux-rhetorical questions covered. This reduced set of 22 features, determined through a combination of classification performance and mutual redundancy as explained in this paper, is available here as an efficiently coded C implementation with wrappers for python and R.
hctsa datasets and example workflows 💾
There are a range of open datasets with pre-computed hctsa features, as well as some examples of hctsa workflows.
- C. elegans movement speed data and associated analysis code.
- Drosophila movement speed and associated analysis code.
- 1000 empirical time series
(If you have data to share and host, let me know and I'll add it to this list)
Running hctsa on a cluster 💻
Matlab code for computing features for an initialized HCTSA.mat
file, by distributing the computation across a large number of cluster jobs (using pbs or slurm schedulers) is here.
Publications 📕
hctsa has been used by us and others to do new science in neuroscience, engineering, and biomedicine. An updated list of publications using hctsa is on this wiki page.
hctsa licenses
Internal licenses
There are two licenses applied to the core parts of the repository:
-
The framework for running hctsa analyses and visualizations is licensed as the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. A license for commercial use is available from Engine Analytics.
-
Code for computing features from time-series data is licensed as GNU General Public License version 3.
A range of external code packages are provided in the Toolboxes directory of the repository, and each have their own associated license (as outlined below).
External packages and dependencies
Many features in hctsa rely on external packages and Matlab toolboxes. In the case that some of them are unavailable, hctsa can still be used, but only a reduced set of time-series features will be computed.
hctsa uses the following Matlab toolboxes: Statistics, Signal Processing, Curve Fitting, System Identification, Wavelet, and Econometrics.
The following external time-series analysis code packages are provided with the software (in the Toolboxes directory), and are used by our main feature-extraction algorithms to compute meaningful structural features from time series:
- TISEAN package for nonlinear time-series analysis, version 3.0.1 (GPL license).
- TSTOOL package for nonlinear time-series analysis, version 1.2 (GPL license).
- Joseph T. Lizier's Java Information Dynamics Toolkit (JIDT) for studying information-theoretic measures of computation in complex systems, version 1.3 (GPL license).
- Time-series analysis code developed by Michael Small (unlicensed).
- Max Little's Time-series analysis code (GPL license).
- Sample Entropy code from Physionet (GPL license).
- ARFIT Toolbox for AR model estimation (unlicensed).
- gpml Toolbox for Gaussian Process regression model estimation, version 3.5 (FreeBSD license).
- Danilo P. Mandic's delay vector variance code (GPL license).
- Cross Recurrence Plot Toolbox (GPL license)
- Zoubin Ghahramani's Hidden Markov Model (HMM) code (MIT license).
- Danny Kaplan's Code for embedding statistics (GPL license).
- Two-dimensional histogram code from Matlab Central (BSD license).
- Various histogram and entropy code by Rudy Moddemeijer (unlicensed).
Other time-series analysis resources
A collection of good resources for time-series analysis (including in other programming languages like python and R) are on the wiki.
Acknowledgements 👋
Many thanks go to Romesh Abeysuriya for helping with the mySQL database set-up and install scripts, and Santi Villalba for lots of helpful feedback and advice on the software.