bloomberg/goldpinger

Clarity on Master vs. Peer Response Time values

mattstam opened this issue · 1 comments

I've been trying to interpret the difference between "master_response_time" and "peers_response_time" in the ping results. The closest thing I see is these labels in the Prometheus metrics:

# HELP goldpinger_kube_master_response_time_s Histogram of response times from kubernetes API server, when listing other instances
# HELP goldpinger_peers_response_time_s Histogram of response times from other hosts, when making peer calls

This explanation still isn't very obvious, especially with the results I get from each (I've seen peer response time be lower than master response time at times). One way I've been thinking about it is intranode vs. internode, but that wouldn't make complete sense either since our "master" pod that we're port-forwarding is still reaching pods on other nodes.

It would be great if these definitions could be elaborated on.

Hi @matt-stam thanks for the issue!

The kube_master_response_time_s metric is the time it takes the particular pod to connect to the Kubernetes API to fetch the list of peers (Goldpinger instances) to ping.

The peers_response_time_s is the response time from the peers.

The peer response time should be consistently faster than the Kubernetes API response time. If it isn't, it would point to some tweaks being needed.

Feel free to reopen if you have any more questions.