SERVFAIL on fallthrough to forward

When trying to use kubernetai with the following configuration

.:53 {
  errors

  kubernetai cluster-a.local in-addr.arpa ip6.arpa {
    fallthrough in-addr.arpa ip6.arpa
    kubeconfig /Users/user/.kube/config arn:aws:eks:us-east-1:000000000000:cluster/a
    pods insecure
  }

  kubernetai cluster-b.local in-addr.arpa ip6.arpa {
    fallthrough in-addr.arpa ip6.arpa
    kubeconfig /Users/user/.kube/config arn:aws:eks:us-east-1:000000000000:cluster/b
    pods insecure
  }

  forward . /etc/resolv.conf {
    max_concurrent 1000
  }
}

and using dig to query PTR records of an IP address

dig @127.0.0.1 -x 1.1.1.1

it raises an error saying that the next plugin was not found

.:53
CoreDNS-1.8.3
darwin/amd64, go1.16.3, 4293992b-dirty
[ERROR] plugin/errors: 2 1.1.1.1.in-addr.arpa. PTR: plugin/kubernetes: no next plugin found specified

and answers the request with SERVFAIL

; <<>> DiG 9.10.6 <<>> -x 1.1.1.1 @127.0.0.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 57076
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;1.1.1.1.in-addr.arpa.          IN      PTR

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Thu May 06 17:11:41 -03 2021
;; MSG SIZE  rcvd: 49

instead of using the forward plugin configuration.

I don't know if it would be appropriate, but patching

kubernetai/plugin/kubernetai/kubernetai.go

Lines 74 to 76 in a3ae9db

    
           } else { 
        
           	rcode, err = k.ServeDNS(ctx, w, r) 
        
           }

to

		} else {
			k.Next = k8i.Next
			rcode, err = k.ServeDNS(ctx, w, r)
		}

solved my problem.

Thanks @t0rr3sp3dr0! At first glance, that seems like the right fix.

Actually, I can only set Next if k8i.Fall[i].Through(state.Name()) is true. I'll submit a PR with the changes.

OK. On second look - it probably would be better to set it at setup time.

kubernetai/plugin/kubernetai/setup.go

Lines 33 to 36 in a3ae9db

    
           dnsserver.GetConfig(c).AddPlugin(func(next plugin.Handler) plugin.Handler { 
        
           	k8i.Next = next 
        
           	return k8i 
        
           })

In there, setting the last instance of k8i.Kubernetes's Next to next ...

Instead of setting the Next for the last Kubernetes instance, I think it would be better to remove the condition that makes the fall zones of the last Kubernetes instance not nil. And then remove the specialization of the code that treats the last Kubernetes instance as a special one. This also makes the code simpler.

So this piece of code

kubernetai/plugin/kubernetai/setup.go

Lines 60 to 63 in a3ae9db

    
           // for all but last instance, disable fallthrough in the Kubernetes object 
        
           if i < len(k8i.Kubernetes)-1 { 
        
           	k8i.Kubernetes[i].Fall.Zones = nil 
        
           }

would become

		// for all instances, disable fallthrough in the Kubernetes object
		k8i.Kubernetes[i].Fall.Zones = nil

And

kubernetai/plugin/kubernetai/kubernetai.go

Lines 47 to 78 in a3ae9db

    
           // If fallthrough is enabled and there are more kubernetes in the list, then we 
        
           // should continue to the next kubernetes in the list (not next plugin) when 
        
           // ServeDNS results in NXDOMAIN. 
        
           if i != (len(k8i.Kubernetes)-1) && k8i.Fall[i].Through(state.Name()) { 
        
           	// Use a non-writer so we don't write NXDOMAIN to client 
        
           	nw := nonwriter.New(w) 
        
           	_, err := k.ServeDNS(ctx, nw, r) 
        
           	// Return SERVFAIL if error 
        
           	if err != nil { 
        
           		return dns.RcodeServerFailure, err 
        
           	} 
        
           	// If NXDOMAIN, continue to next kubernetes instead of next plugin 
        
           	if nw.Msg.Rcode == dns.RcodeNameError { 
        
           		continue 
        
           	} 
        
           	// Otherwise write message to client 
        
           	m := nw.Msg 
        
           	state.SizeAndDo(m) 
        
           	m = state.Scrub(m) 
        
           	w.WriteMsg(m) 
        
           	return m.Rcode, err 
        
           } else { 
        
           	rcode, err = k.ServeDNS(ctx, w, r) 
        
           } 
        
           return rcode, err

would become

		// If fallthrough is disabled, then we should not continue to the next plugin.
		if !k8i.Fall[i].Through(state.Name()) {
			return k.ServeDNS(ctx, w, r)
		}

		// Use a non-writer so we don't write NXDOMAIN to client
		nw := nonwriter.New(w)

		_, err := k.ServeDNS(ctx, nw, r)

		// Return SERVFAIL if error
		if err != nil {
			return dns.RcodeServerFailure, err
		}

		// If NXDOMAIN, continue to next kubernetes instead of next plugin
		if nw.Msg.Rcode == dns.RcodeNameError {
			continue
		}

		// Otherwise write message to client
		m := nw.Msg
		state.SizeAndDo(m)
		m = state.Scrub(m)
		w.WriteMsg(m)

		return m.Rcode, err

Previously,

kubernetai/plugin/kubernetai/kubernetai.go

Line 80 in a3ae9db

return plugin.NextOrFailure(k8i.Name(), k8i.Next, ctx, w, r)

was never called as the last Kubernetes instance would throw an error due to Next being nil. With this approach, the code would now be called and handle the fallthrough properly.

	dnsserver.GetConfig(c).AddPlugin(func(next plugin.Handler) plugin.Handler {
	k8i.Next = next
	return k8i
	})

	// for all but last instance, disable fallthrough in the Kubernetes object
	if i < len(k8i.Kubernetes)-1 {
	k8i.Kubernetes[i].Fall.Zones = nil
	}

	// If fallthrough is enabled and there are more kubernetes in the list, then we
	// should continue to the next kubernetes in the list (not next plugin) when
	// ServeDNS results in NXDOMAIN.
	if i != (len(k8i.Kubernetes)-1) && k8i.Fall[i].Through(state.Name()) {
	// Use a non-writer so we don't write NXDOMAIN to client
	nw := nonwriter.New(w)

	_, err := k.ServeDNS(ctx, nw, r)

	// Return SERVFAIL if error
	if err != nil {
	return dns.RcodeServerFailure, err
	}

	// If NXDOMAIN, continue to next kubernetes instead of next plugin
	if nw.Msg.Rcode == dns.RcodeNameError {
	continue
	}

	// Otherwise write message to client
	m := nw.Msg
	state.SizeAndDo(m)
	m = state.Scrub(m)
	w.WriteMsg(m)

	return m.Rcode, err

	} else {
	rcode, err = k.ServeDNS(ctx, w, r)
	}

	return rcode, err