kiwix/container-images

After migration to k8s, stats of download.kiwix.org looks strange

Closed this issue · 3 comments

First the number of unique visitors of download.kiwix.org has strongly diminished after the migration to K8s, see
image

Maybe this is because of a wrong interpretation of how a unique visitor is counted, but the documentation does not help to validation this scenario https://matomo.org/faq/general/faq_43/ and https://matomo.org/faq/general/faq_21418/.

In addition, after the migration, we have days were not logs have been uploaded at all
image

The pretty strong increase over the last day is not really explainable too.

@rgaudin All of this let me tend to think that web server logs are not always uploaded properly.

First of all, analyzing stats numbers alone is probably a bad idea. Traffic fluctuates and Kiwix is widely known projects so except major events (incoming traffic or technical issue), one can only make variably-informed guesses based on context.

That being said, here's we I (now) know:

  • prior to k8s, we were running the import script many times a day, over the same log file. Each entry in the log would be uploaded as many times as the script is launched after visit. Matomo is not verbose about how non-JS visits are counted. Doc indicate that User-agent + IP is considered (over a 30mn time frame) but does this applies to such uploads? Upload script itself doesn't do any kind of uniqueness guessing (at least the version we're using). It does filter out many requests (Errors, bots, search engines, static files, etc). I've tried to re-run a full day log ; there's a significant increase (10%+) but it's not crazy.
  • we had a long period during which the proxy was not forwarding the client's IP, resulting in all requests being assign to the same IP. Those are not all counted as a single unique visitor but grouped based on other criteria (probably time and user-agent). For Sunday April 17th for instance, we have a unique visitor (no user-agent) with 17,435 hits and another one (same IP) on android with 2 hits. April 21st is when this was fixed and the bump is clearly visible in the graph.
  • We failed to upload stats for download.kiwix.org four Sunday in a row (10/04, 17/04, 24/04 and 30/04). This is linked to the db issue every Monday morning and I have re-imported them manually ; except for April 10th which is past the logrotate limit.
  • Stats look in par with 2022's pre-war values now ; mostly since Apr 21st so I'd say that the IP issue and the missing Sunday's were the main cause ; especially as traffic was still high due to East-European traffic.

@rgaudin Does the sunday problem is fixed... or this is a duplicate of an other ticket?