random sampling + accounting for P/R in final numbers
soodoku opened this issue · 0 comments
soodoku commented
Dear All,
Loved the work!
Two small potential improvements:
- "When we tested this heuristic on the known fake stars in our dummy account, we found that while it could be very computationally expensive" --- one way out of it is to use random sampling and bound the percentage of fake
- "it was both very good at detecting fake accounts and also extremely accurate (98% precision and 85% recall)" --- the final numbers don't account for P/R. Here's what I mean: http://gojiberries.io/2021/05/30/best-guess-of-true-proportion-of-1s/