business-science/anomalize

Sign error in limits of GESD method

frseguin opened this issue · 4 comments

I believe there is a sign error in the GESD method in anomalize.methods.R.

limit_upper = critical_value * mad - median

Specifically, the issue is concerning the derivation of the upper bound derived from the critical value from the GESD method. These bounds are stored in the outlier_report element of the output when using verbose = TRUE in the gesd function. The bounds are derived from the equation

\frac{\left|x_m - \mathrm{median}(X)\right|}{\mathrm{MAD}} \leq \lambda

which after rearranging gives

\mathrm{median}(X) - \lambda \cdot \mathrm{MAD}  \leq x_m \leq \lambda \cdot \mathrm{MAD} + \mathrm{median}(X)

In particular, the right hand side is \lambda \cdot \mathrm{MAD} + \mathrm{median}(X) and not \lambda \cdot \mathrm{MAD} - \mathrm{median}(X)

I believe changing line 188 to

limit_upper = critical_value * mad + median

would fix the issue.

I concur; once stated that way it's algebraically obvious that $lower_limit = critical_value = upper_limit$, which your proposed fix corrects.

Thanks both @frseguin & @technocrat. I will update the package shortly.

Should be fixed now in the development version. Can you check to make sure it works for you?

To be clear @technocrat the lower bound was actually fine. The issue was that $upper_limit = -$lower_limit, which happens to work in the case where the median is zero. This is why the issue was not detected in the example provided in the vignette where the data is normal(0,1).

The changes of commit bf158eb does fix the issue completely, and I have confirmed that everything is working as far as I am concerned.

Thank you @mdancho84 for the fast reply!