nasaharvest/dora

Add results module for correct vs. selected outliers

hannah-rae opened this issue · 4 comments

For use cases that have some validation data, we can create plots like this one to compare the performance of various algorithms for prioritizing outliers:

image

wkiri commented

@vinr515 Thanks for adding this capability! I gave it a try and it works great. I noticed that the x axis was being truncated a little too far and have added a proposed update. I'll submit a PR for your review to check it.

wkiri commented

@vinr515 It looks like this approach is sorting the results by their scores and then looking them up in the labels in that order. Instead, we want to use the selections in the order provided (do not sort the scores), since some methods (like DEMUD) independently score each item and sorting them would change the order. Also, if the user only requested the top N items, then matching their sorted indices against the labels would not yield correct results. I recommend using the dts_sels variable which already has the indices in selection order. We can use this to index into the labels.

(I assume that this is the intended semantics of the labels: each line contains the index of an item and a '1' if it is an outlier and '0' otherwise.)

wkiri commented

I believe I have fixed this, but I'd love to hear from others to ensure it is working as expected for them before closing this issue :)

Thanks for this fix. It works for me so I'll close it.