/pysax

python implementation of SAX (Symbolic Aggregate Approximation) for time series data

Primary LanguageC

pysax

python implementation of SAX (Symbolic Aggregate Approximation) for time series data

Idea

  1. Convert time series data into symbolic representation, where the (Euclidean) distance/similarity is lower bound by the distance in the symbolic space
  2. The symbolic representation can be viewed as a low-dim (aggregate) representation of time series
  3. Symbol based algorithms such as suffix-tree, markov chain can be used to analyze time-series

References

  1. paper
  2. website
  3. jmotif application
  4. tutorial
  5. R package
  6. Another python implementation
  7. GrammarVis
  8. GrammarVis github
  9. GrammarVis VSM github
  10. jMotif github

Why are we re-implementing it?

  1. SAX has certain assumptions on time-series data, such as (1) local Gaussian, (2) fixed frequence, (3) real-valued signals. We want to explore more possiblities for other data
  2. We want a vector representation of time-series pieces, similiar to the idea of representing words a vectors (Google's word2vec)
  3. we need a fast parallel implementation

TODO

examples

python wrapper for sequitur

Idea

  1. sequitur will be used as the context-free grammar extractor for SAXed data
  2. the mined rules will be used for outlier/motif detection
  3. we wrap the c++ implementation for python usage - so it is just a quick workaround for now.

References

  1. three papers listed on Grammarviz website
  2. sequitur site
  3. another python sequitur implementation
  4. java implementation can be found in grammarviz2 implementation

how to use the wrapper

  1. download c++ code http://sequitur.info/latest/sequitur.tgz
  2. put the sequitur code from the uncompressed folder in a convienet place
  3. use the pysequitur package and pass the path to sequitur as constructor parameter

why do we wrap it?

  1. to make it easier to use with pysax
  2. we understand that the c++ implementation treats rule terminals as single characters, whereas in pysax we are dealing with words, so we need to map the words to single characters first - this might change in future based on our understanding of the code.

TODO

examples

grammar-based outlier detection

Idea

  1. to implement the idea based on Time series anomaly discovery with grammar-based compression - using grammar analysis for time series outlier detection
  2. main steps: a. SAX-symbolize the time series b. numerosity reduction c. grammar induction by sequitur d. map rules to subsequences e. mine the patten

TODO

examples