contrib/anaisdg/anomalydetection mad function is incorrect
anussel5559 opened this issue · 1 comments
Currently, the diff_med
table is calculated as such:
diff_med =
diff
|> median(column: "_value")
|> map(fn: (r) => ({r with MAD: k * r._value}))
|> filter(fn: (r) => r.MAD > 0.0)
Which correctly assigns the MAD value to the MAD
column in the diff_med
table as k * median(abs(x - median(xi)))
(the underlying _value
column comes from the diff table, which calculated the absolute difference of the individual values and the datasets median.)
however that MAD
column is unused in the output calculation:
output =
join(tables: {diff: diff, diff_med: diff_med}, on: ["_time"], method: "inner")
|> map(fn: (r) => ({r with _value: r._value_diff / r._value_diff_med}))
|> map(
fn: (r) =>
({r with level:
if r._value >= threshold then
"anomaly"
else
"normal",
}),
)
Note: the output table _value
column is calculated in the map as: _value_diff / _value_diff_med
. The _value
column from the diff_med
table is NOT the full MAD, it is only the median of the difference or median(abs(x - median(xi)))
- it is missing the multiplication by the k constant.
The fix here could be as simple as adjusting the map in the diff_med table to
map(fn: (r) => ({r with _value: k * r._value}))
Which would correctly assign the MAD in to the _value
column to be used in the final output calculation.
This issue has had no recent activity and will be closed soon.