how to use DBA
Closed this issue · 5 comments
Hi,
I have 10000 time series biological data . each is 12000 in length.
It seems DTW is prohibitively expensive while DBA is very promising.
Can you teach me how to you DBA to speed us calculating distances and cluster all the data?
Thanks a lot.
Huanle
PS: i am a biologist, knowing nothing but a bit of python programming.
Hi,
10,000 time series is no problem for DBA but 12,000 in length might require some adaptation not to have to store the whole matrix (12,000 x 12,000) in memory. Did you try DTW as well?
In your case, do you have some knowledge about how much warping you might expect? Can I ask what time of time series these are?
Best,
François
@fpetitjean ,
thanks heaps for your reply.
I have tried dtw from mply.
It will take forever to finish calculating distance between any pair of series longer than 8000.
I have no idea how much warping to expect for.
The data are simply signals measure for DNA nucleotides.
You can imagine it as scanning each nucleotide consequentially for 10kb long DNA molecule and store the value for certain feature in an array.
I hope it makes some sense to you.
Best,
Huanle
@Huanle It doesn't completely make sense but a bit. So you have some real-valued measurement along DNA sequences of about 12,000 bases?
The time DTW takes to compute evolves quadratically with the length (O(L²)), but if you allow - say - a maximum warping of 20%, then calculation can be reduced by several factors. Do you have any knowledge about how far apart in the sequences 2 bases could be matched?
I tried it and it does work for 12,000-length series no problem - the warping window makes it manageable (the more manageable the smaller the warping window). Maybe I close this issue but you can contact me directly francois.petitjean@monash.edu.
Hi @fpetitjean ,
sorry to bother you again.
But were you, by any chance, able to receive my email?
It would be nice to have your help with this specific issue.
Thanks heaps in advance. :)