TSI

TSI: Indexing and classifying gigabytes of time series under time warping.

Classifying large time series dataset by searching through a hierarchical k-means tree with Dynamic Time Warping. For more information about the theory behind the algorithm, refer to the paper Indexing and classifying gigabytes of time series under time warping

Preamble

This is the source code for the work published in SDM 2017: Indexing and classifying gigabytes of time series under time warping

Authors: Chang Wei Tan, Geoffrey I. Webb, Francois Petitjean

When using this repository, please cite:

@INPROCEEDINGS{Tan2017-SDM,
  author = {Tan, Chang Wei and Webb, Geoffrey I. and Petitjean, Francois},
  title = {Indexing and classifying gigabytes of time series under time warping},
  booktitle = {SIAM International Conference on Data Mining},
  year = 2017,
  pages = {1--10}
}

If you want to use the code or found any bug in the code, please drop me an email at chang.tan@monash.edu. Thanks!

Executing the code

Before using the code, ensure that these files and folders exist in your project directory which can be obtained from https://cloudstor.aarnet.edu.au/plus/index.php/s/pRLVtQyNhxDdCoM and https://drive.google.com/open?id=0B8Cg6Izm3IJybWxnWDJPeWZQWVk

Output folder: outputs/L experiment exists in your project directory
SITS folder: dataset/SITS_2006_NDVI_C/SITS1M_foldfold number (e.g.dataset/SITS_2006_NDVI_C/SITS1M_fold1)
UCR folder: dataset/UCR_Time_Series_Archive/UCR datasets name
Also, make sure that in the dataset folders, you have a csv file for the properties of the dataset. The file should have the following format: "nb of class,size of training set,size of testing set,time series length,warping window". Check the UCR Time Series website, http://www.cs.ucr.edu/~eamonn/time_series_data/ for the properties of each dataset. For SITS, the properties are "24,900000,100000,46,4". Feel free to change these parameters to suit your program.
Exact indices of 1NN-DTW using best warping window: index1NN/SITS or UCR/SITS1M_fold#_1NN_LB_index1NN.csv or UCRDataset_1NN_LB_index1NN.csv. The indices are sorted in the order downloaded from the UCR Time Series website.

The main files to run the program are

SITS_NNDTW.java
SITS_NNED.java
SITS_TSI.java
UCR_NNDTW.java
UCR_TSI.java

Example 1, Running TSI on UCR dataset 50words.

Run UCR_TSI.java or SITS_TSI.java

Assuming using Eclipse, go to Run Configurations > Arguments

The program arguments are

Project path : project directory/src
Dataset name : 50words / change to SITS1M_fold# for SITS
Branching factor : 3
Max k-means iterations : 10
Number of NN : 1
Test number : 10
Candidate intervals to record results : 10
Number of results to record before seeing 1st candidate : 5
Number of time series to examine per query : 10
Warping window size (in terms of length) : 10

Alternatively, can change these parameters individually in the code

Example 2, Running NNDTW on UCR dataset 50words.

Run UCR_NNDTW.java, SITS_NNDTW.java or SITS_NNED.java (if NNED, there will be no warping window)