Questions about DTAN.
yjhong89 opened this issue · 5 comments
Hello,
I am YJHong.
I reached your nice work while I am trying to solve time series matching problem.
My problem is just for sequence matching, don't need to classify sequences after aligning.
I have few questions about DTAN.
-
What if someone has variable data length of sequences? It seems like sequences in UCR dataset have almost fixed length, but in my case, sequence lengths vary too much. A first option I though is zero-padding short sequences to match max-length in a batch similar with speech tasks. But it may need some constraints telling some part of sequence in a batch are zero-padded. Can you give me a guide for this thing?
-
In DTAN, input signals U_i are aligned thorough f_loc and CPAB. But what I want to see is a kind of warping path from reference signal to target signal. Is it possible in DTAN?
Thank you in advance.
Regards.
Hi YJHong,
Thanks for your feedback! I'm glad you found our work useful :)
Regarding your questions:
-
Variable length (VL) input: as mentioned in the paper under the "
Variable length and multi-channel data" section, adjusting DTAN to VL input requires f_loc that could handle such data (RNN?), removing the boundary condition from CPAB and a proper loss function (SDTW?).
Another solution, as you suggested, is adjusting the input signals such that they are of equal lengths (zero-padding for instance). You might want to check how the good people of the UCR archive preprocessed their data.
Imposing a constrain is another solution, perhaps try adjusting the smoothness prior such that signals which are more zero-padded allow for larger the deformation? (just thinking out loud here). -
DTAN is built for the joint alignment of signals. Therefor, there's no "reference signal". That being said, you could adjust the loss function such that the loss is between a batch of signals and a reference signal, and not the batch mean signal. Afterwards, simply plot the index mapping between them.
Hope this helps! If so let me know and I'll close this issue.
Thanks for your answers!
OK with closing issue.
@ronshapiraweber
I re-read DTAN (especially Variable length amd multi-channel data subsection) and gonna ask you something. (Sorry for asking after closing issue).
Suppose there are N sequences with different sequence length.
Firstly, I would design f_loc with RNN/TCN which are capable of treating variable length sequences.
Then theta_i would come out from f_loc and V_i would be generated via CPAB.
My question is that the length of V_i is different from each other respectively due to the length of U_i is different?
If then, the loss function in equation (5) (single class case) would be inappropriate so I need to use proper loss function like SoftDTW instead of l2 loss.
Am I getting it right?
Thanks.
Regards.
@yjhong89
Yes, you got it right :)
Keep in mind that you will also have to "turn off" the boundary condition on the CPAB transformation (U_i[0]=V_i[0] and U_i[n]=V_i[n], 'n' being the sequence length). That way, the output signal does not have to be of the same length.
Note: you will have to compute SDTW for each batch, which might be very costly. Take this into consideration when implementing the code.
Cheers and good luck!
Ron
@ronshapiraweber Thanks!