replicatedhq/troubleshoot

Be able to detect search domain misconfiguration

Closed this issue · 0 comments

Describe the rationale for the suggested feature.
In cases where a search domain exists and resolves to a wildcard dns record (i.e: the search domain example.com resolves to *.example.com) kubernetes will inadvertently resolve in-cluster records via the search record. i.e: test.svc.cluster.local.example.com

Describe the feature
to better detect this misconfiguration we can compare a manual dns query inside the cluster to what the kubernetes API asserts as the cluster IP of a service. to ensure portability we can use the kubernetes service in the default namespace, as it's guaranteed to exist.

we can compare the outputs of:
kubectl get svc kubernetes -o jsonpath='{.spec.clusterIP}'
and
dig +short kubernetes.default.svc.cluster.local run from inside a pod

there are three possible states for the output:

  1. the outputs match, and it's likely that DNS is configured correctly
  2. dig returns NXDOMAIN, and it's likely that coredns is down or misconfigured
  3. the outputs differ, and it's likely that the search domains are interfering with cluster name resolution

It might be necessary to write a custom analyzer for this, as it would require comparing data from both the cluster-resources collector and a custom run-pod.

Describe alternatives you've considered

It might be possible to chain together existing collectors to output a file in a format that a text analyser can compare.