Recursor status is down

Question

Recursor status is down

Appendme opened this issue a year ago · 12 comments

Using the example Private Authoritative Server I get the down status of recursor in webui dnsdist, and there is also an entry a.root-servers.net/A in the table Servfail domain in webui recursor.

If I do as written here #10 (comment) then the recursor will start working

My goal:
I have two windows server DNS servers DC1 and DC2 and I want to add forwarding to pdns to receive static records added through admin webui

Answer 1 · 2023-09-24T16:39:03.000Z

I get the down status of recursor in webui dnsdist

I can not reproduce this behaviour using the referenced example. What I would try:

check if the recursor's IP matches the configured IP in dnsdist
check if dnsdist can access the recursor

The recursor's IP should be fixed by following section in docker-compose.yml:

recursor:
    ipv4_address: 172.31.117.117

The default dnsdist config can be found here:
https://github.com/chrisss404/powerdns/blob/master/dnsdist/conf/conf.d/servers.conf

Query example.com

$ dig @127.0.0.1 -p 1053 example.com

; <<>> DiG 9.18.17 <<>> @127.0.0.1 -p 1053 example.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 38676
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;example.com.			IN	A

;; ANSWER SECTION:
example.com.		86400	IN	A	93.184.216.34

;; Query time: 147 msec
;; SERVER: 127.0.0.1#1053(127.0.0.1) (UDP)
;; WHEN: Sun Sep 24 18:08:37 CEST 2023
;; MSG SIZE  rcvd: 56

Query test.sys configured via pdns

$ dig @127.0.0.1 -p 1053 test.sys

; <<>> DiG 9.18.17 <<>> @127.0.0.1 -p 1053 test.sys
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 37631
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;test.sys.			IN	A

;; ANSWER SECTION:
test.sys.		60	IN	A	10.0.0.1

;; Query time: 6 msec
;; SERVER: 127.0.0.1#1053(127.0.0.1) (UDP)
;; WHEN: Sun Sep 24 18:08:34 CEST 2023
;; MSG SIZE  rcvd: 53

I want to add forwarding to pdns to receive static records added through admin webui

The recursor is configured to perform DNSSEC validation in this example, so you might want to turn it off by setting the environment variable RECURSOR_DNSSEC to off or you need to enable dnssec for your TLD and configure the trust anchor in the recursor correspondingly using the environment variable RECURSOR_TRUST_ANCHORS.

HTH & BR
Christian

Answer 2 · 2023-09-24T17:21:11.000Z

Thanks for the answer. I checked it now on another computer and it works. Perhaps the problem is in the old docker on the machine where this problem occurs. I'll check my guess a little later.

Answer 3 · 2023-09-25T10:02:26.000Z

Docker update didn't help. I tried to run it on another server and it got the same error as on the first one. From the previous answer, run on a computer with debian, the rest where it was not possible to run it on ubuntu, this is the only thing that distinguishes them from my point of view.

Here is my config:
https://gist.github.com/Appendme/f690b6b82320978bcfb2e57481a43681

dnsdist can access to recursor:

>docker compose exec dnsdist sh
/ # apk add bind-tools
/ # dig @172.31.117.117 example.com
;; communications error to 172.31.117.117#53: timed out
;; communications error to 172.31.117.117#53: timed out

; <<>> DiG 9.18.19 <<>> @172.31.117.117 example.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 14459
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;example.com.                   IN      A

;; Query time: 0 msec
;; SERVER: 172.31.117.117#53(172.31.117.117) (UDP)
;; WHEN: Mon Sep 25 09:17:28 UTC 2023
;; MSG SIZE  rcvd: 40

/ # ping 172.31.117.117
PING 172.31.117.117 (172.31.117.117): 56 data bytes
64 bytes from 172.31.117.117: seq=0 ttl=64 time=0.121 ms
64 bytes from 172.31.117.117: seq=1 ttl=64 time=0.086 ms
64 bytes from 172.31.117.117: seq=2 ttl=64 time=0.049 ms
64 bytes from 172.31.117.117: seq=3 ttl=64 time=0.105 ms
^C
--- 172.31.117.117 ping statistics ---
4 packets transmitted, 4 packets received, 0% packet loss
round-trip min/avg/max = 0.049/0.090/0.121 ms

recusor logs

Answer 4 · 2023-09-25T15:23:48.000Z

dnsdist can access to recursor:

Not on port 53, there is a communication error and the query status is SERVFAIL instead of NOERROR:

;; communications error to 172.31.117.117#53: timed out
;; communications error to 172.31.117.117#53: timed out

This is what it looks like in my dnsdist container:

$ docker-compose -f private-authoritative.yml exec dnsdist sh
/ # apk add bind-tools
/ # dig @172.31.117.117 example.com

; <<>> DiG 9.18.19 <<>> @172.31.117.117 example.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 26566
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;example.com.			IN	A

;; ANSWER SECTION:
example.com.		86353	IN	A	93.184.216.34

;; Query time: 0 msec
;; SERVER: 172.31.117.117#53(172.31.117.117) (UDP)
;; WHEN: Mon Sep 25 15:06:04 UTC 2023
;; MSG SIZE  rcvd: 56

From what you describe the only thing that comes to mind is that it might be related to firewall (iptables) rules. I would try to compare the iptables rules between your debian and ubuntu host using: /sbin/iptables -L -n

HTH & Good luck

Answer 5 · 2023-09-25T17:54:19.000Z

On a worked server forward policy ACCEPT, but this did not help

Answer 6 · 2023-09-25T18:06:05.000Z

I think the problem is Bad file descriptor errors on recursor: logs

Answer 7 · 2023-09-25T18:32:34.000Z

I think the problem is Bad file descriptor errors on recursor: logs

This could also be the reason, it definitely shouldn't be there, see my logs: recursor.log

You can also try using one of the release versions instead of latest, e.g.:

-image: chrisss404/powerdns:latest-recursor
+image: chrisss404/powerdns:4.9.1-recursor

Answer 8 · 2023-09-25T20:57:34.000Z

There may be a problem with access to the root dns, later I’ll try to change the healthcheck in dnsdist.

Answer 9 · 2023-09-28T06:31:12.000Z

Yes, the problem was in health check due to problems with access to the root dns. Thanks for answering.

Answer 10 · 2023-09-29T10:08:07.000Z

Great that you resolved your issue, can you share how you were able to identify the root cause of not being able to resolve a.root-servers.net/A on your host?

In case someone else runs into a similar issue, this is how you can adapt the dnsdist healthcheck:

dnsdist documentation regarding healthchecks: https://dnsdist.org/guides/downstreams.html#healthcheck
create a custom servers.conf based on https://github.com/chrisss404/powerdns/blob/master/dnsdist/conf/conf.d/servers.conf
mount it into your container overriding the default, e.g.:

volumes:
  - ./servers.conf:/etc/dnsdist/conf.d/servers.conf:ro

Answer 11 · 2023-09-29T12:17:38.000Z

In the recursor container, I noticed that requests to powerdns.com were dropping, after using dig with the +trace parameter I noticed some strangeness: first, requests go to tld and gltd, they seem to pass, but then they seem to fall off... In general, I changed the health check host in dnsdist and that’s it it worked

Answer 12 · 2023-09-29T14:22:12.000Z

Thx, for your answer.

It seems that your recursor is not working properly as it is unable to fulfill one of its main purposes, namely resolving top level domains.
If you don't want to resolve other domains than the ones configured in your authoritative server then you might not need to have a recursor at all.