Question on LevelShiftAD

Question

Question on LevelShiftAD

Closed this issue 3 years ago · 2 comments

I have created a simple example.

from adtk.detector import LevelShiftAD
from adtk.visualization import plot
import pandas

d1 = [1, 10000, 1,     10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 1]
d2 = [1, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 1]
s = pandas.Series(d1, index=pandas.date_range("2021-01-01", periods=len(d)))

level_shift_ad = LevelShiftAD(c=6.0, side='both', window=2)
anomalies = level_shift_ad.fit_detect(s)

plot(s, anomaly=anomalies, anomaly_color='red');

With d2 two anomalies are detected.
With d1 no anomalies are detected.

Why?
Or maybe the question must be: What should I look at to understand? :)

Thanks

Answer 1 · 2021-07-20T15:26:27.000Z

Hi @phaabe

This is because the data set you are using is very small.

However if you just use data sets that are bit larger, you should see the results your were expecting to see. Transformations and computations of this nature on very small samples may not always work as you expect them too.

You are using 14 values, try with 16 and you will see the result you were expecting.

d1 = [1, 10000, 1,     10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 1]
d1Longer = [1, 10000, 1, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 1]
s = pandas.Series(d1Longer, index=pandas.date_range("2021-01-01", periods=len(d1Longer)))
level_shift_ad = LevelShiftAD(c=6.0, side='both', window=2)
anomalies = level_shift_ad.fit_detect(s)
plot(s, anomaly=anomalies, anomaly_color='red')

Internally the LevelShiftAD algorithm runs through a number of transforms on the data.
The DoubleRollingAggregate is the first and it will calculate very different values for d1 and d2 even in the first RollingAggregate, never mind the second.

s = pandas.Series(d1, index=pandas.date_range("2021-01-01", periods=len(d1)))
s.rolling(2).median()

s = pandas.Series(d2, index=pandas.date_range("2021-01-01", periods=len(d2)))
s.rolling(2).median()

As you can see after running those ^^ d1 and d2 are markedly different on the first rolling aggregrate, I will not go so far as to break each computation and its output as internally Pipenet is quite complex.

Answer 2 · 2021-07-21T08:49:53.000Z

Great, thanks