Multi-Window Finder

Question

Multi-Window Finder

Opened this issue 10 months ago · 2 comments

It was brought to my attention that Keogh's lab wrote a nice short paper about identifying the "right" window sizes (beyond Pan Matrix Profiles) and I think it should be pretty straightforward to implement:

Paper
Code (Jupyter Notebook) and Data Sets

It would be great to see a notebook reproducer of this

Answer 1 · 2025-01-21T18:23:47.000Z

A few things got my attention after taking a look at the paper/code. Going to share it to just highlight it for future readers:

(1) The paper proposes an algorithm which, at its core, uses a function that takes a time series T and window size m as inputs, and returns a real value as output. Although the authors use the term dist (distance) for the returned value, it is better to use another term as the returned value can be negative.

(2) The paper seems to not mention z-normalization. So, one might be curious to explore if the proposed algorithm still works when the subsequences are substantially different regarding their average but are similar after z-normalization (e.g. subsequences {0.1, 0.2, 0.3} and {100, 200, 300})

(3) Algorithm 1-lines 3-4 shows the following pseudo-code:

MA=moving-avg(T, w) //Algroithm 2
moving-dist ← 𝑆𝑢𝑚(𝐿𝑜𝑔(𝑎𝑏𝑠 (MA−𝑚𝑒𝑎𝑛(MA)))

However, the code shows the following line:

np.log(abs(moving_avg - (moving_avg).mean()).sum())

Note that sum and log are swapped. Maybe that's just a typo in placing parentheses. Or, there might be a certain reason behind such change. IMO, the paper's version makes sense as it probably tries to affect the extremely small or extremely large value in 𝑎𝑏𝑠 (MA−𝑚𝑒𝑎𝑛(MA)). The code's version however just takes a log of a positive value and this does not affect the final outcome AFAIU.

(4) Algorithm 1-lines 8-11 shows:

for i in local-min do
    𝑟𝑒𝑠 ← 𝑤𝑠 [𝑖]/(𝑖 +1)
end for
𝑤 = 𝑚𝑒𝑎𝑛(res)

The code shows:

for i in range(3):
    reswin.append(window_sizes[b[i]]/ (i+1))
reswin = np.array(reswin)
winTime = 0.8 * reswin[0] + 0.15 * reswin[1] + 0.05 * reswin[2]

why3 and 0.8, 0.15, 0.05?

Answer 2 · 2025-01-21T18:35:06.000Z

Note that sum and log are swapped.

I noticed this too and this inconsistency scares me. I think we really need to take special care when trying this out and really understand/test everything before adding it