coredns/kubernetai

SERVFAIL on fallthrough to forward

Closed this issue · 6 comments

When trying to use kubernetai with the following configuration

.:53 {
  errors

  kubernetai cluster-a.local in-addr.arpa ip6.arpa {
    fallthrough in-addr.arpa ip6.arpa
    kubeconfig /Users/user/.kube/config arn:aws:eks:us-east-1:000000000000:cluster/a
    pods insecure
  }

  kubernetai cluster-b.local in-addr.arpa ip6.arpa {
    fallthrough in-addr.arpa ip6.arpa
    kubeconfig /Users/user/.kube/config arn:aws:eks:us-east-1:000000000000:cluster/b
    pods insecure
  }

  forward . /etc/resolv.conf {
    max_concurrent 1000
  }
}

and using dig to query PTR records of an IP address

dig @127.0.0.1 -x 1.1.1.1

it raises an error saying that the next plugin was not found

.:53
CoreDNS-1.8.3
darwin/amd64, go1.16.3, 4293992b-dirty
[ERROR] plugin/errors: 2 1.1.1.1.in-addr.arpa. PTR: plugin/kubernetes: no next plugin found specified

and answers the request with SERVFAIL

; <<>> DiG 9.10.6 <<>> -x 1.1.1.1 @127.0.0.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 57076
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;1.1.1.1.in-addr.arpa.          IN      PTR

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Thu May 06 17:11:41 -03 2021
;; MSG SIZE  rcvd: 49

instead of using the forward plugin configuration.

I don't know if it would be appropriate, but patching

} else {
rcode, err = k.ServeDNS(ctx, w, r)
}
to

		} else {
			k.Next = k8i.Next
			rcode, err = k.ServeDNS(ctx, w, r)
		}

solved my problem.

Thanks @t0rr3sp3dr0! At first glance, that seems like the right fix.

Actually, I can only set Next if k8i.Fall[i].Through(state.Name()) is true. I'll submit a PR with the changes.

OK. On second look - it probably would be better to set it at setup time.

dnsserver.GetConfig(c).AddPlugin(func(next plugin.Handler) plugin.Handler {
k8i.Next = next
return k8i
})

In there, setting the last instance of k8i.Kubernetes's Next to next ...

Instead of setting the Next for the last Kubernetes instance, I think it would be better to remove the condition that makes the fall zones of the last Kubernetes instance not nil. And then remove the specialization of the code that treats the last Kubernetes instance as a special one. This also makes the code simpler.

So this piece of code

// for all but last instance, disable fallthrough in the Kubernetes object
if i < len(k8i.Kubernetes)-1 {
k8i.Kubernetes[i].Fall.Zones = nil
}
would become

		// for all instances, disable fallthrough in the Kubernetes object
		k8i.Kubernetes[i].Fall.Zones = nil

And

// If fallthrough is enabled and there are more kubernetes in the list, then we
// should continue to the next kubernetes in the list (not next plugin) when
// ServeDNS results in NXDOMAIN.
if i != (len(k8i.Kubernetes)-1) && k8i.Fall[i].Through(state.Name()) {
// Use a non-writer so we don't write NXDOMAIN to client
nw := nonwriter.New(w)
_, err := k.ServeDNS(ctx, nw, r)
// Return SERVFAIL if error
if err != nil {
return dns.RcodeServerFailure, err
}
// If NXDOMAIN, continue to next kubernetes instead of next plugin
if nw.Msg.Rcode == dns.RcodeNameError {
continue
}
// Otherwise write message to client
m := nw.Msg
state.SizeAndDo(m)
m = state.Scrub(m)
w.WriteMsg(m)
return m.Rcode, err
} else {
rcode, err = k.ServeDNS(ctx, w, r)
}
return rcode, err
would become

		// If fallthrough is disabled, then we should not continue to the next plugin.
		if !k8i.Fall[i].Through(state.Name()) {
			return k.ServeDNS(ctx, w, r)
		}

		// Use a non-writer so we don't write NXDOMAIN to client
		nw := nonwriter.New(w)

		_, err := k.ServeDNS(ctx, nw, r)

		// Return SERVFAIL if error
		if err != nil {
			return dns.RcodeServerFailure, err
		}

		// If NXDOMAIN, continue to next kubernetes instead of next plugin
		if nw.Msg.Rcode == dns.RcodeNameError {
			continue
		}

		// Otherwise write message to client
		m := nw.Msg
		state.SizeAndDo(m)
		m = state.Scrub(m)
		w.WriteMsg(m)

		return m.Rcode, err

Previously,

return plugin.NextOrFailure(k8i.Name(), k8i.Next, ctx, w, r)
was never called as the last Kubernetes instance would throw an error due to Next being nil. With this approach, the code would now be called and handle the fallthrough properly.