eth-educators/eth-docker

Some Lodestar dashboard metrics are broken

nflaig opened this issue · 4 comments

Problem

Noticed some metrics are broken

image

likely since #1621 because it relabels the job label to use docker service name

- source_labels: [__meta_docker_container_label_com_docker_compose_service]
target_label: job

The problem is that dashboard uses job as a selector in some metrics, e.g. lodestar_version{job=~"$beacon_job|beacon"} but the job is now called consensus

See job name in the ls prom config is still correct (as updated in #1140)

scrape_configs:
- job_name: 'beacon'

Possible Solutions

Job name has to match beacon or maybe a better solution is to just update the label selector when downloading the dashboard

*lodestar* )

An even better solution would be for the dashboard to use metrics and not filter on the job name at all. lodestar_version should be returned from CL and VC I assume?

Another way to skin this may be: If the variable $beacon_job exists then what happens when it is set to consensus?

It's still far cleaner to not use job in Dashboards, as the job could be anything. The only thing you know will be there are the metrics themselves, however the user chose to scrape them.

Another way to skin this may be: If the variable $beacon_job exists then what happens when it is set to consensus?

It's still far cleaner to not use job in Dashboards, as the job could be anything. The only thing you know will be there are the metrics themselves, however the user chose to scrape them.

I agree that it is not ideal to rely on job label, I added the variables a while ago (ChainSafe/lodestar#5211) to let a user set a custom value when importing the dashboard which works well but this is not supported during dashboard provisioning (grafana/grafana#10786).

An even better solution would be for the dashboard to use metrics and not filter on the job name at all. lodestar_version should be returned from CL and VC I assume?

Yes, we use the same metric name lodestar_version in vc and bn, would have to check what was the reason for this becasue all other custom metrics on the validator client are prefixed with vc_.

But this still does not solve the issue that we will need to differentiate based on some label as prometheus defines a set of standard process metrics which should be prefixed with process_. Now, we could prefix those as well but I think the purpose of this is to make these metric work with a whole range of standardized dashboards that wanna collect these data points.

I have a solution, but its brittle and will break when the Lodestar dashboard gets updated and the location of these variables in the array changes:
jq '.templating.list[3].query |= "consensus" | .templating.list[4].query |= "validator"'

I have a solution, but its brittle and will break when the Lodestar dashboard gets updated and the location of these variables in the array changes: jq '.templating.list[3].query |= "consensus" | .templating.list[4].query |= "validator"'

Thanks for the fix, will keep that in mind