skynetservices/skydns

Metrics misreported

Opened this issue · 0 comments

So, noticed some weird behavior in the Prometheus metrics being reported for kube-dns on our cluster. It seems that sky dns is consistently reporting skydns_skydns_dns_error_count_total{cause="nxdomain",system="auth"} with a number higher than skydns_skydns_dns_request_count_total{system="auth"} .

See output from the metrics endpoint below:

# HELP skydns_skydns_dns_cachemiss_count_total Counter of DNS requests that result in a cache miss.
# TYPE skydns_skydns_dns_cachemiss_count_total counter
skydns_skydns_dns_cachemiss_count_total{cache="response"} 4596
# HELP skydns_skydns_dns_error_count_total Counter of DNS requests resulting in an error.
# TYPE skydns_skydns_dns_error_count_total counter
skydns_skydns_dns_error_count_total{cause="nxdomain",system="auth"} 7576
# HELP skydns_skydns_dns_request_count_total Counter of DNS requests made.
# TYPE skydns_skydns_dns_request_count_total counter
skydns_skydns_dns_request_count_total{system="auth"} 4596

Based on the description of these metrics, should this even be possible?

Versions below:

Kube-DNS: gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.8
Kubernetes: Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.6", GitCommit:"9f8ebd171479bec0ada837d7ee641dec2f8c6dd1", GitTreeState:"clean", BuildDate:"2018-03-21T20:49:26Z", GoVersion:"go1.9.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.3", GitCommit:"d2835416544f298c919e2ead3be3d0864b52323b", GitTreeState:"clean", BuildDate:"2018-02-07T11:55:20Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}