Generate ROC seems to mirror FP and FN values

Question

Generate ROC seems to mirror FP and FN values

mkoskamp opened this issue 5 years ago · 2 comments

When using the generate ROC operation with a labeled set as input containing an outlier score, de confusion matrix always shows same number of FP's and FN's.
No matter what algorithm is used.
I used LOF for a global anomaly problem to illustrate it would perform badly. It indeed missed all the 1676 outliers. However, it marked exactly 1676 other items as outlier.
I cannot reproduce how the treshold for the prediction is chosen. It seems the truth label is taken into account when creating a prediction, which seems odd.
I posted this issue including pictures on the rapidminer forum.

https://community.rapidminer.com/discussion/56333/anomaly-extention-generate-roc-seems-to-mirror-fp-fn-rate

Dataset and models available if needed.

Answer 1 · 2019-11-06T10:43:08.000Z

In the meantime I looked a bit further into this problem and reread Goldstein's article on the comparison of anomly detection algorithms. I believe the described behaviour above is intentional. Since there is no clear rule of how to choose a threshold for outlier scores, the Generate ROC component must choose a treshold for each algorithm that allows for comparison of different algorithms. So i think it will start from the top and stop when the FP/FN rate is symmetrical.

Answer 2 · 2019-11-10T14:40:14.000Z

Yes, you are right. Since there is no rule of how to set the outlier threshold in general, there is a standard way of evaluation: test all possilbe outlier thresholds (instances ordered by score) and compute for each one a tpr anf fpr. This results in the ROC, which is delivered by the roc output of the ROC-operator. The area under curve (AUC) is a single number for the ROC if this is preferred. The higher the AUC, the better the algorithm, regardless of the setting of the threshold (fair evaluation).
The AUC is delivered as an output as a number and as a performance vector (for convinient logging).

There is also another way of evaluation, which should not be used because it is kind of cheating: Suppose you know the number of anomalies which should be detected in advance. Then you can determine the perfect threshold: This is additionally done by every operator resulting in a "prediction" column. This column is also evaluated in the ROC operator and leads to the accuracy measure you descibed earlier. Again: don't use this for fair evaluation.