Author: Arjun Ashok Mentors: Jitendra Singh, Devyani Lambhate
This repository contains the code for multivariate time series representation learning based on two self-supervised learning methods, and the code for finetuning/adapting the representations for a downstream task.
The supported methods are:
The supported downstream tasks are:
- Multi-horizon forecasting, both univariate and multivariate
- Regression
- Classification
- Anomaly Detection
The recommended requirements for TS2Vec are specified as follows:
- Python 3.8
- torch==1.8.1
- scipy==1.6.1
- numpy==1.19.2
- pandas==1.0.1
- scikit_learn==0.24.2
- statsmodels==0.12.2
- Bottleneck==1.3.2
The dependencies can be installed by:
pip install -r requirements.txt
The datasets can be obtained and put into datasets/
folder in the following way:
- 128 UCR datasets should be put into
datasets/UCR/
so that each data file can be located bydatasets/UCR/<dataset_name>/<dataset_name>_*.csv
. - 30 UEA datasets should be put into
datasets/UEA/
so that each data file can be located bydatasets/UEA/<dataset_name>/<dataset_name>_*.arff
. - 3 ETT datasets should be placed at
datasets/ETTh1.csv
,datasets/ETTh2.csv
anddatasets/ETTm1.csv
. - Electricity dataset should be preprocessed using
datasets/preprocess_electricity.py
and placed atdatasets/electricity.csv
. - Yahoo dataset should be preprocessed using
datasets/preprocess_yahoo.py
and placed atdatasets/yahoo.pkl
. - KPI dataset should be preprocessed using
datasets/preprocess_kpi.py
and placed atdatasets/kpi.pkl
.
To train and evaluate TS2Vec on a dataset, run the following command:
python train.py <dataset_name> <run_name> --loader <loader> --batch-size <batch_size> --repr-dims <repr_dims> --gpu <gpu> --eval
The detailed descriptions about the arguments are as following:
Parameter name | Description of parameter |
---|---|
dataset_name | The dataset name |
run_name | The folder name used to save model, output and evaluation metrics. This can be set to any word |
loader | The data loader used to load the experimental data. This can be set to UCR , UEA , forecast_csv , forecast_csv_univar , anomaly , or anomaly_coldstart |
batch_size | The batch size (defaults to 8) |
repr_dims | The representation dimensions (defaults to 320) |
gpu | The gpu no. used for training and inference (defaults to 0) |
eval | Whether to perform evaluation after training |
method | The method to use (either cost or ts2vec ) |
(For descriptions of more arguments, run python train.py -h
.)
After training and evaluation, the trained encoder, output and evaluation metrics can be found in training/DatasetName__RunName_Date_Time/
.
Scripts: The scripts for reproduction are provided in scripts/
folder.
An example with TS2Vec as the method is given below:
from ts2vec import TS2Vec
import datautils
# Load the ECG200 dataset from UCR archive
train_data, train_labels, test_data, test_labels = datautils.load_UCR('ECG200')
# (Both train_data and test_data have a shape of n_instances x n_timestamps x n_features)
# Train a TS2Vec model
model = TS2Vec(
input_dims=1,
device=0,
output_dims=320
)
loss_log = model.fit(
train_data,
verbose=True
)
# Compute timestamp-level representations for test set
test_repr = model.encode(test_data) # n_instances x n_timestamps x output_dims
# Compute instance-level representations for test set
test_repr = model.encode(test_data, encoding_window='full_series') # n_instances x output_dims
# Sliding inference for test set
test_repr = model.encode(
test_data,
casual=True,
sliding_length=1,
sliding_padding=50
) # n_instances x n_timestamps x output_dims
# (The timestamp t's representation vector is computed using the observations located in [t-50, t])
This codebase uses parts of code from the following repositories: