Parallelize the DTW process
Closed this issue · 7 comments
I have around 200 time series, but they are of length 30,000. The DBA algorithm just doesn't converge (as expected). Would there be a way to parallelize the DTW process within your algorithm?
Since I am not from cs background (mechanical engineer), my foundations in time series analysis and coding are a bit weak. However, if you could give me quick walk through on what needs to be done, I suppose I should be able to figure out the solution,
Hi Deepak,
Because DTW's complexity is quadratic with the length, parallelism isn't really going to help with length 30,000. However, you could add a warping window, which would turn it to something manageable. The Cython and the Java (DBAWarpingWindow) versions can do it (the 'w' parameter).
The higher the value of w, the more warping you allow. The complexity then transforms into L.(2w+1) instead of L². With a length of 30,000 I would aim for w=50 or so but it would depend on your application.
F
Thank you for your reply Dr. Petitjean. While trying the warping window approach for series with varying lengths, I just got NANs in my output. I am further investigating the issue but I guess this approach won't work unless the series have same lengths.
Hi Deepak,
There is no theoretical problem with doing average for sequences with different lengths. I have done it in the Python version (https://github.com/fpetitjean/DBA/blob/master/DBA.py). Is the problem with that code that you also want to use a warping window and/or multivariate series? If yes let me know and I'll add it to my todolist.
F
Which language are you using?
Basically there are 3 main functionalities but unfortunately I've developed them as people have asked and I haven't had time to create a single version with all of them. These are:
- series with multiple lengths
- multi-variate series (series where each element has multiple values)
- warping window: additional parameter that allows warping to be limited (and complexity greatly reduced)
Number 1 is in DBA.py, number 2 is in DBA_multi-variate.py and number 3 is in DBAWarpingWindow.java.
I would start from DBA_multi-variate.py and add:
- make data be in an array of 2D matrices, instead of in a 3D matrix, which would give you functionality 1
- adapt the recurrence in the for i, for j in the loops from java to python with the additional parameter to get functionality 3.
This all would only take me 40min or so but I unfortunately don't have that in the next week or so :(
Cheers,
François