This code is built upon the one that you can find in Nacho Navarro's repository. I have extended that code to:
- add the possibility to perform one-sided tests, both for positive and negative anomalies;
- use sample standard deviations (
np.std(x, ddof=1)
).
Seasonal ESD is an anomaly detection algorithm implemented at Twitter: https://arxiv.org/pdf/1704.07706.pdf.
The algorithm uses the Extreme Studentized Deviate test (also known as Grubbs Test) to calculate the anomalies. In fact, the novelty doesn't come in the fact that ESD is used, but rather on what it is tested.
The problem with the ESD test on its own is that it assumes a normal data distribution, while real world data can have a multimodal distribution. To circumvent this, STL decomposition is used. Any time series can be decomposed with STL decomposition into a seasonal, trend, and residual component. The key is that the residual has a unimodal distribution that ESD can test.
However, there is still the problem that extreme, spurious anomalies can corrupt the residual component. To fix it, the paper proposes to use the median to represent the "stable" trend, instead of the trend found by means of STL decomposition.
Finally, for data sets that have a high percentage of anomalies, the research papers proposes to use the median and Median Absolute Deviate (MAD) instead of the mean and standard deviation to compute the z-score. Using MAD enables a more consistent measure of central tendency of a time series with a high percentage of anomalies.
Grubbs's test is defined for the hypothesis:
The Grubbs test statistic is defined as:
where
This is the two-sided test, for which the hypothesis of no outliers is rejected at significance level
$$ G > \frac{N - 1}{\sqrt{N}}\sqrt\frac{t^{2}{\alpha / (2N), N-2}}{N - 2 + t^{2}{\alpha/(2N),N-2} } $$
with
The Grubbs test can also be defined as a one-sided test, replacing
with
with