Motivation
-
kube-dns is the most important component of a kubernetes cluster, therefore it needs to be working all time.
-
If there is a problem with either of the nodes, dns-pods, misconfigured network, or routing issues -> kube-dns service -> kube-dns pods you must know about it
-
trigger alerts based on error rate.
-
container:8080/metrics, already has prometheus annotaitons so will be scraped automatically
-
sum(rate(dns_query_fail_count[1m])) by (kubernetes_node,node_ip,job) / sum(rate(dns_query_total_count[1m])) by (kubernetes_node,node_ip,job) * 100 > 0
-
check kuberentes/alert.rules
How To
go run kube-dns-checker.go
go build kube-dns-checker.go
docker build -t kube-dns-checker .
docker run -p8080:8080 kube-dns-checker
docker build -t radut/kube-dns-checker .
docker push radut/kube-dns-checker
Environment Variables
`GO_RESOLVER` boolean use internal GO resolver or DIG, default false (use dig)
`DOMAINS` comma separated domains example "www.google.com,www.cloudflare.com", default value "www.google.com"
`NAMESERVERS` comma separated servers which are being used to query example "DEFAULT,8.8.8.8", default values "DEFAULT", which interogates the server from /etc/resolv.conf.
`TIMEOUT` dig timeout in seconds, default '3s' # with dig by default it retries on tcp, with same timeout
`INTERVAL` interval to run checks default '5s'