stumpy-dev/stumpy

Multi-Window Finder

Opened this issue Β· 2 comments

It was brought to my attention that Keogh's lab wrote a nice short paper about identifying the "right" window sizes (beyond Pan Matrix Profiles) and I think it should be pretty straightforward to implement:

Paper
Code (Jupyter Notebook) and Data Sets

It would be great to see a notebook reproducer of this

A few things got my attention after taking a look at the paper/code. Going to share it to just highlight it for future readers:

(1) The paper proposes an algorithm which, at its core, uses a function that takes a time series T and window size m as inputs, and returns a real value as output. Although the authors use the term dist (distance) for the returned value, it is better to use another term as the returned value can be negative.

(2) The paper seems to not mention z-normalization. So, one might be curious to explore if the proposed algorithm still works when the subsequences are substantially different regarding their average but are similar after z-normalization (e.g. subsequences {0.1, 0.2, 0.3} and {100, 200, 300})

(3) Algorithm 1-lines 3-4 shows the following pseudo-code:

MA=moving-avg(T, w) //Algroithm 2
moving-dist ← π‘†π‘’π‘š(πΏπ‘œπ‘”(π‘Žπ‘π‘  (MAβˆ’π‘šπ‘’π‘Žπ‘›(MA)))

However, the code shows the following line:

np.log(abs(moving_avg - (moving_avg).mean()).sum())

Note that sum and log are swapped. Maybe that's just a typo in placing parentheses. Or, there might be a certain reason behind such change. IMO, the paper's version makes sense as it probably tries to affect the extremely small or extremely large value in π‘Žπ‘π‘  (MAβˆ’π‘šπ‘’π‘Žπ‘›(MA)). The code's version however just takes a log of a positive value and this does not affect the final outcome AFAIU.

(4) Algorithm 1-lines 8-11 shows:

for i in local-min do
    π‘Ÿπ‘’π‘  ← 𝑀𝑠 [𝑖]/(𝑖 +1)
end for
𝑀 = π‘šπ‘’π‘Žπ‘›(res)

The code shows:

for i in range(3):
    reswin.append(window_sizes[b[i]]/ (i+1))
reswin = np.array(reswin)
winTime = 0.8 * reswin[0] + 0.15 * reswin[1] + 0.05 * reswin[2]

why3 and 0.8, 0.15, 0.05?

Note that sum and log are swapped.

I noticed this too and this inconsistency scares me. I think we really need to take special care when trying this out and really understand/test everything before adding it