fastdtw function is slower than in DynamicTimeWarping.jl
maetshju opened this issue · 8 comments
Hi there!
It makes me really happy that this package exists because I've been getting frustrated with the speed of using dynamic time warping on large datasets in Python. While I was comparing this package to the older DynamicTimeWarp.jl package, I found that the fastdtw function is slower here than it is in the older package. I'm a little confused about why this might be, because my understanding is that this package is based on DynamicTimeWarp.jl.
When I time the fastdtw functions using the @btime
macro from BenchmarkTools.jl, I get approximately 14 ms for the function from DynamicTimeWarp.jl, and approximately 76 ms for the function from TimeWarp.jl.
This is the script, assuming you have two sounds "sound1.wav" and "sound2.wav":
using TimeWarp, DynamicTimeWarp, BenchmarkTools, Compat, WAV
s1, sf1 = wavread("sound1.wav", format="double")
s2, sf2 = wavread("sound2.wav", format="double")
s1 = reshape(s1, size(s1)[1])
s2 = reshape(s2, size(s2)[1])
@btime DynamicTimeWarp.fastdtw($s1, $s2, 1) # 14 ms
@btime TimeWarp.fastdtw($s1, $s2, 1) # 76 ms
I was just wondering if this is expected behavior, or if there might be a simple fix for this.
Does this performance difference get much worse/larger for longer time series?
I ran it on some longer audio files (around 1000 times longer), and it looks like the ratio of the difference is about constant.
Package | @btime output |
---|---|
DynamicTimeWarp.jl | 17.744 s (288721458 allocations: 22.40 GiB) |
TimeWarp.jl | 82.779 s (2828198599 allocations: 52.71 GiB) |
FYI, if you still need a fast implementation in Python, check out **pydtw
which uses the UCR Suite optimisations.
The struct definition
struct Sequence{N,T} <: AbstractArray{T,1}
val::AbstractArray{T,N}
end
has an abstractly typed field, meaning that all access to the vectors will incur the cost of dynamic dispatch. Parameterizing the struct on the vector type would solve this and likely speed up all algorithms that take a Sequence
as input.
Changing the struct definition to
struct Sequence{N,T} <: AbstractArray{T,1}
val::Array{T,N}
end
gave me a significant speedup, which I believe is a way to implement @baggepinnen's suggestion. With the following script, I got 7.518 ms runtime for the modified Sequence
definition and a 152.005 ms runtime for the original definition.
using TimeWarp, BenchmarkTools
# formula for pure tone: A*sin(2π * f * t)
# where A is amplitude,
# f is the frequency of the tone measured in Hz
# t is the time of the tone measured in s
const sf = 44100 # sampling frequency of 44.1 kHz
# synthesize a 440 Hz pure tone sampled at 44.1 kHz
# duration of 0.4 s (400 ms)
# 1 * sin(2π * 440 Hz * t)
s1_t = range(0.4, length=ceil(Int64, (.4 * sf)))
s1 = sin.(2π * 440 .* s1_t)
# synthesize a 600 Hz pure tone sampled at 44.1 kHz
# duratoin 0f 0.8 s (800 ms)
# 1 * sin(2π * 600 Hz * t)
s2_t = range(0.8, length=ceil(Int64, (.8 * sf)))
s2 = sin.(2π * 600 .* s2_t)
@btime fastdtw($s1, $s2, 1)
I am experimenting with some additional performance improvements over at https://github.com/baggepinnen/DynamicAxisWarping.jl
I got rid of the Sequence
completely. It was only used to get a custom indexing behaviour, so I replaced it with a custom indexing method instead. Having to wrap and unwrap vectors is a bit of a hassle and this got rid of all wrapper methods
On a somewhat unrelated note, have you seen the recent paper
https://arxiv.org/pdf/2003.11246.pdf
"FastDTW is approximate and Generally Slower than the Algorithm it Approximates"
I will have to start watching your fork. Interesting paper there too; matches some behavior I had been casually observing with speech recognition data.