dagster-io/fake-star-detector

random sampling + accounting for P/R in final numbers

soodoku opened this issue · 0 comments

Dear All,

Loved the work!

Two small potential improvements:

  1. "When we tested this heuristic on the known fake stars in our dummy account, we found that while it could be very computationally expensive" --- one way out of it is to use random sampling and bound the percentage of fake
  2. "it was both very good at detecting fake accounts and also extremely accurate (98% precision and 85% recall)" --- the final numbers don't account for P/R. Here's what I mean: http://gojiberries.io/2021/05/30/best-guess-of-true-proportion-of-1s/