DIRACGrid/diracx

Requirements for Parameter DBs

Closed this issue · 2 comments

Parameters DBs (at a minimum for JobParameters and PilotParameters) need the following:

  • Store key-value pairs.
  • The keys should not be pre-set, and it should be possible to add new keys at any time.
  • It should be possible to search through the values. Practical example: we should be able to answer which job ran on a certain worker node at time X.
  • It should be possible to easily create plots in Grafana. Example: Supposing a {"ModelName": "some_Intel_AMD_bof"} parameter I want to see the current "composition" of my Grid. And the composition per-site.
  • the lifetime of parameters should not be the same of the lifetime of their jobs/pilots.

After playing around with using a dump of the LHCb ElasticJobParametersDB this I've come to the conclusion that anything that MySQL is not going to play nicely with this use case. The count(*) queries for making dashboards are too slow when you have many rows due to MVCC. I also tried with postgres and while it has a bunch of features that make it nicer but it still has the same fundamental issue.

I'm sure we could come up with something clever using triggers but it'd be non-trivial and doesn't seem worth it.

IIUC, we keep the current OpenSearch-based solution. If that is the case, then at least DIRACGrid/DIRAC#7292 could be evaluated.