Global alignment kernel returns NaN for all timeseries
NAThompson opened this issue · 1 comments
Describe the bug
gak(x,y)
returns nan
for all x,y
.
To Reproduce
import random
import numpy
from math import pi as π
from tslearn.metrics import sigma_gak, gak
def test_reproduce():
f0 = 20e9
period = 1/f0
ω0 = 2*π*f0
# works at length 405; dies at 410:
waveforms = numpy.empty(shape=(5, 410))
times = numpy.linspace(-period/2, period/2, waveforms.shape[1])
for i in range(waveforms.shape[0]):
φ = random.gauss(0.0, 0.5)
k = 3 + random.uniform(-0.5, 0.5)
waveforms[i, :] = 0.5*(numpy.tanh(k*ω0*times + φ) + 1) + random.uniform(-0.05, 0.05)
typical_values = 0.5*(numpy.tanh(3*ω0*times) + 1)
σ = sigma_gak(typical_values)
scores = numpy.empty(waveforms.shape[0])
for i in range(waveforms.shape[0]):
scores[i] = gak(typical_values, waveforms[i, :], sigma=σ)
print(scores)
print(gak(waveforms[0, :], waveforms[0, :], sigma=σ))
Expected behavior
The computation should not return NaN; maybe it needs to be stabilized with the log-sum-exp method?
Environment (please complete the following information):
- OS: ARM MacOS
- tslearn version: 0.6.3
Hello @NAThompson,
Thank you for your issue.
Indeed, there is an overflow which can occur if the function gak
is run on long time series which have very few variations.
We have: gak(s1, s2) = unnormalized_gak(s1, s2) / sqrt(unnormalized_gak(s1, s1) * unnormalized_gak(s2, s2))
For example, if s1
is a constant time series (worst possible scenario) of size n
, the function unnormalized_gak(s1, s1)
will return the number of possible warpings to match s1
with itself, each warping being optimal.
This number is equal to the number of paths in a grid of n
rows and n
columns to go from the top left cell to the bottom right cell when the only possible moves are going right, down or diagonally (right and down).
When s1
is a constant time series, unnormalized_gak(s1, s1)
returns a float when n <= 405
an returns inf
when n >= 406
.
If s1
and s2
are both constant time series of size n
, the product unnormalized_gak(s1, s1) * unnormalized_gak(s2, s2)
returns a float when n <= 204
and returns inf
when n >= 205
.