eXist-db/public-repo

Improve Tracking

Opened this issue · 3 comments

Since version 2.0.0 of the public repo package up- and downloads are tracked.

The current implementation does not differentiate in any way where those requests originated from.
It would be beneficiary to identify following clients:

  • packageservice
  • web / browser
  • ivy / buildsystems

We identified several additional events or data points that would make sense to track

  • calls to the /find route
    • especially the minimum processor version requirement
    • maybe also what was requested
  • calls to the /update, /admin and /packages route that yield no results or throw an error

For identifying clients, could we use the User-Agent header?

I know we can set this header in EXPath HTTP Client requests. I wonder whether we have such control in our build systems.

I see maven allows custom user agents: https://stackoverflow.com/questions/1561658/how-can-the-user-agent-be-changed-in-maven.

UA strings are a way, yes. It may be especially interesting since this might be a way to "see" legacy clients as well.
@adamretter suggested to add specific parameters that would identify the origin but it is almost impossible to add those to previous versions of build scripts, package managers and the like.
We should explore if there is even more information we could add to each logged event. The raw request data - minus IP address - hold all kinds of valuable information.

Only if storage does not grow too fast and performance does not suffer.