Website: https://sites.google.com/site/himeicdm/
To download and run the executable jar with demo:
git -clone https://github.com/flash121123/HIME.git
cd HIME/bin
. demo.sh
res.txt is the motif discovery result.
[First Instance Start Location][First Instance End Location][Second Instance Start Location][Second Instance End Location][Length][Distance]
The matlab visualization code/tested dataset/screenshot are in http://bit.ly/2rvBETV
TO VISUALIZE DEMO IN MATLAB:
plotVLMotif('demo.txt','res.txt');
or
plotVLMotif('demo.txt','res.txt’, [Motif Threshold You Want to Test]);
Using Default Parameter (start enumeration from length 300):
java -Xmx8g -jar HIME_release.jar [DATASET] > tmp.log
Choosing minimum length start enumeration:
java -Xmx8g -jar HIME_release.jar [DATASET] [MINIMUM LENGTH] > tmp.log
Choosing PAA and minimum length:
java -Xmx8g -jar HIME_release.jar [DATASET] [PAA] [MINIMUM LENGTH] > tmp.log
Choosing PAA, Alphabet Size and minimum length:
java -Xmx8g -jar HIME_release.jar [DATASET] [PAA] [MINIMUM LENGTH] [ALPHABET SIZE] > tmp.log
Choosing PAA, Alphabet Size, minimum length and R(L)=xL:
java -Xmx8g -jar HIME_release.jar [DATASET] [PAA] [MINIMUM LENGTH] [ALPHABET SIZE] [x] > tmp.log
TO CONVERT LOGS TO RESULT DATA:
grep -i "Motif" tmp.log | cut -d' ' -f 2- > res.txt
-
The main process does not drop false positive motifs so res.txt may contain false positive. The threshold parametr is only used for discretization step in the program (See paper for detail). You can filter out all false positive based on the output result (distance metric for each motif).
-
The algorithm returns pair of variable length motif instances. You can use MASS algorithm in http://www.cs.unm.edu/~mueen/FastestSimilaritySearch.html to find exact instances. You may use MDL metric, original threshold or result-driven motif threshold to locate the occuring instances of motif.
If you found the code is useful, please cite the paper
@INPROCEEDINGS{hime2017,
author={Y. Gao and J. Lin},
booktitle={2017 IEEE International Conference on Data Mining (ICDM)},
title={Efficient discovery of time series motifs with large length range in million scale time series},
year={2017},
pages={1213-1222},
month={Nov}
}
Dataset References can be found in the paper.