SciProgCentre/kmath

Feature-by-case request: Nonlinear time-series clustering based on their statistical similarity

sa18 opened this issue · 0 comments

sa18 commented

There is a numeric series of temporal data, for example, temperature.

It is required to colorize it by segments, where the same colors would mean the statistical similarity of the data under each segment. I would do it like this:

  1. Split the series into equal segments of a given length.
  2. For all pairs of segments, perform statistical similarity test. The result higher than 70% should mean the pair of segments are similar, and we'll assign the same color on them. Otherwise, we assign different colors.

Expectation from the math library:

  1. Support for optimal storage of time-series (in this case 1D, but in a more general case - multidimensional).
  2. Functional library to perform statical tests (Kolmogorov-Smirnov, Cucconi and others)
  3. Ability to generate permutations, incl. random (required by Cucconi test implementation), with maximum performance and minimum memory consumption.

Here is (more complicated) description of classification by stat tests.