bloomberg/goldpinger

goldpinger metrics explanation

k0nstantinv opened this issue · 4 comments

Hi ! I find goldpinger very useful, but I totally can't understand his metrics meaning. Can you explain, please?

goldpinger_nodes_health_total{app="goldpinger",controller_revision_hash="6bd97ddd49",goldpinger_instance="node22",instance="10.44.35.135:8080",job="pods",kubernetes_namespace="goldpinger",kubernetes_pod_name="goldpinger-qzpd7",status="unhealthy"} 4

4 as result here. What does that mean exactly?

goldpinger_nodes_health_total{goldpinger_instance="node30",instance="node08",job="goldpingers",status="unhealthy"} 8

8 as result here. What does that mean exactly?

It means, that the instance goldpinger_instance="node22" saw 4 peers (nodes/pods) in status ,status="unhealthy" the last time it checked.

The goldpinger_instance="node30" instance saw 8 peers unhealthy, although I'm not sure why the two sets of labels are different in the two cases.

Hope that helps!

@seeker89 thanks! Sorry for the confusion with two sets of metrics. The second set is from the prometheus job, and the first one is from the pods directly (via pod annotation). Anyway, I still can't understand how to detect unhealthy goldpinger instances for a particular goldpinger instance using the metrics?

No problem. The current metrics are from the POV of a particular pod - it reports what it sees. You'd need the reverse metrics to be able to say which particular instance is seen as unhealthy. It could be added, but it's not currently there. I think that someone had asked about this before, so perhaps that's something we should add.

Okay! Thanks, hope that metrics will be added soon