r888888888/reportbooru

Incorrect numbers on Top Taggers report

Opened this issue · 4 comments

User Chiera reported to me on Discord that the Top Taggers report appeared to be very off for user Nova_Genesis. Chiera referred to the Upload Tags report for that user where it indeed looks like the user is tagging on average much beyond the reported 7 tags.

I checked for myself using the API on the post_versions controller, and the following numbers are what I came up with for that user between 2017-08-25 08:40:54 UTC to 2017-09-24 08:40:54 UTC.

Total Uploads: 296
Tag Mean: 24.3
Tag Median: 29
Tag Q1: 7
Tag Q3: 37

It looks like versions are missing in the BigQuery table.

I'll run a script to backpopulate the past month and see if that improves the accuracy. I've also reduced the batch size of the BigQuery export. Maybe 1000xTagCount was erroring out?

The last report seems to be way off the normal values.

https://isshiki.donmai.us/user-reports/taggers/2018-02-04_v1.html

I compiled my own report over roughly the same period which showed the expected values.

http://testbooru.donmai.us/wiki_pages/top_tagger_report
http://testbooru.donmai.us/wiki_pages/bottom_tagger_report

There may be some double counting going on with the missing version exporter. I'll look into adding dup checks in both exporters.