Some Lodestar dashboard metrics are broken
nflaig opened this issue · 4 comments
Problem
Noticed some metrics are broken
likely since #1621 because it relabels the job label to use docker service name
eth-docker/prometheus/base-config.yml
Lines 30 to 31 in 64a933b
The problem is that dashboard uses job as a selector in some metrics, e.g. lodestar_version{job=~"$beacon_job|beacon"}
but the job is now called consensus
See job name in the ls prom config is still correct (as updated in #1140)
eth-docker/prometheus/rootless/ls-prom.yml
Lines 1 to 2 in 64a933b
Possible Solutions
Job name has to match beacon
or maybe a better solution is to just update the label selector when downloading the dashboard
eth-docker/grafana/provision.sh
Line 54 in 64a933b
An even better solution would be for the dashboard to use metrics and not filter on the job name at all. lodestar_version
should be returned from CL and VC I assume?
Another way to skin this may be: If the variable $beacon_job exists then what happens when it is set to consensus
?
It's still far cleaner to not use job
in Dashboards, as the job
could be anything. The only thing you know will be there are the metrics themselves, however the user chose to scrape them.
Another way to skin this may be: If the variable $beacon_job exists then what happens when it is set to consensus?
It's still far cleaner to not use job in Dashboards, as the job could be anything. The only thing you know will be there are the metrics themselves, however the user chose to scrape them.
I agree that it is not ideal to rely on job label, I added the variables a while ago (ChainSafe/lodestar#5211) to let a user set a custom value when importing the dashboard which works well but this is not supported during dashboard provisioning (grafana/grafana#10786).
An even better solution would be for the dashboard to use metrics and not filter on the job name at all. lodestar_version should be returned from CL and VC I assume?
Yes, we use the same metric name lodestar_version
in vc and bn, would have to check what was the reason for this becasue all other custom metrics on the validator client are prefixed with vc_
.
But this still does not solve the issue that we will need to differentiate based on some label as prometheus defines a set of standard process metrics which should be prefixed with process_
. Now, we could prefix those as well but I think the purpose of this is to make these metric work with a whole range of standardized dashboards that wanna collect these data points.
I have a solution, but its brittle and will break when the Lodestar dashboard gets updated and the location of these variables in the array changes:
jq '.templating.list[3].query |= "consensus" | .templating.list[4].query |= "validator"'
I have a solution, but its brittle and will break when the Lodestar dashboard gets updated and the location of these variables in the array changes:
jq '.templating.list[3].query |= "consensus" | .templating.list[4].query |= "validator"'
Thanks for the fix, will keep that in mind