Duplicate DNS-queries with v0.61.0
iJorgen opened this issue ยท 11 comments
I'm seeing duplicate DNS-queries with v0.61.0 and it seems to query two upstream servers for every DNS-request.
The release note says something about "imp load balance". Is this a new feature trying to do some load-balancing between the upstreams?! Seems like it's not working for me and the same query is forwarded to two upstream servers. All worked fine with v0.60.1 and before.
I'm not using "all-servers" or "fastest-addr", so just want DNSProxy to use my fastest (RTT) upstream.
UPDATE:
During the first 10-20 minutes after starting up DNSProxy there are no duplicates, but after that the duplicates show up (...if that is a clue).
When looking at the DNS-logs it seems to use ALL upstream-servers now and does ONE query before moving to the next. Guess this is the "imp load balance" mentioned in the notes. I have specified 16 upstream servers (different cities, datacenters and protocols) with DNS-stamps. Does DNSProxy keep an open connection to ALL these 16 servers or does it need to do a new TLS/Quic-handshake for each request if the connection was closed?!
I liked better when DNSProxy found the fastest server based on RTT and held on to that as long as no other server was faster. Load-balancing will probably work best for people just using a few servers like 8.8.8.8 and 8.8.4.4 as their upstreams, but not for more complex setups. Can we maybe get a config-option to choose the upstream-strategy like "round-robin: true/false"?!
Could you please enable verbose logging in dnsproxy and see if it actually sends duplicate queries?
Here is a verbose log as requested. Seems to be two queries (DoQ and DoH3) in the first dump below.
One thing I just noticed is that the duplicates only seems to appear from Apple-devices (tested on iOS, iPadOS, TvOS) both wireless and wired. Can that be a clue?! Not using Private Relay and tested to remove the mobile-config profile. Still the same. Really strange, because no problems at all with duplicates before DNSProxy v0.61.0.
Example from an iPad:
Dec 19 12:40:38 admin: ;www.example.com. IN A
Dec 19 12:40:38 admin: 2023/12/19 12:40:38 7216#158 [debug] dnsproxy: handling new udp packet from 127.0.0.1:21365
Dec 19 12:40:38 admin: 2023/12/19 12:40:38 7216#158 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).logDNSMessage(): IN: ;; opcode: QUERY, status: NOERROR, id: 49467
Dec 19 12:40:38 admin: ;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
Dec 19 12:40:38 admin: ;; QUESTION SECTION:
Dec 19 12:40:38 admin: ;www.example.com. IN AAAA
Dec 19 12:40:38 admin: 2023/12/19 12:40:38 7216#158 [debug] IPv6 is disabled. Reply with NoError to www.example.com. AAAA request
Dec 19 12:40:38 admin: 2023/12/19 12:40:38 7216#158 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).logDNSMessage(): OUT: ;; opcode: QUERY, status: NOERROR, id: 49467
Dec 19 12:40:38 admin: ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
Dec 19 12:40:38 admin: ;; QUESTION SECTION:
Dec 19 12:40:38 admin: ;www.example.com. IN AAAA
Dec 19 12:40:38 admin: ;; AUTHORITY SECTION:
Dec 19 12:40:38 admin: www.example.com. 10 IN SOA fake-for-negative-caching.adguard.com. hostmaster.www.example.com. 100500 1800 60 604800 86400
Dec 19 12:40:38 admin: 2023/12/19 12:40:38 7216#168 [debug] dnsproxy: handling new udp packet from 127.0.0.1:30933
Dec 19 12:40:38 admin: 2023/12/19 12:40:38 7216#168 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).logDNSMessage(): IN: ;; opcode: QUERY, status: NOERROR, id: 6519
Dec 19 12:40:38 admin: ;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
Dec 19 12:40:38 admin: ;; QUESTION SECTION:
Dec 19 12:40:38 admin: ;www.example.com. IN HTTPS
Dec 19 12:40:38 admin: 2023/12/19 12:40:38 7216#168 [debug] dnsproxy: https://doh3.dns.nextdns.io:443/<redacted>/DK-ZEP-DoH3: sending request over udp: HTTPS www.example.com.
Dec 19 12:40:38 admin: 2023/12/19 12:40:38 7216#159 [debug] dnsproxy: upstream quic://DK-ANX-DoQ-<redacted>.dns.nextdns.io:853 successfully finished exchange of ;www.example.com. IN A; elapsed 94.447778ms
Dec 19 12:40:38 admin: 2023/12/19 12:40:38 7216#159 [debug] proxy: replying from upstream: rtt is 94.618739ms
Dec 19 12:40:38 admin: 2023/12/19 12:40:38 7216#159 [debug] Override TTL from 86400 to 28800
Dec 19 12:40:38 admin: 2023/12/19 12:40:38 7216#159 [debug] Override TTL from 86400 to 28800
Dec 19 12:40:38 admin: 2023/12/19 12:40:38 7216#159 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).logDNSMessage(): OUT: ;; opcode: QUERY, status: NOERROR, id: 40331
Dec 19 12:40:38 admin: ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
Dec 19 12:40:38 admin: ;; QUESTION SECTION:
Dec 19 12:40:38 admin: ;www.example.com. IN A
Dec 19 12:40:38 admin: ;; ANSWER SECTION:
Dec 19 12:40:38 admin: www.example.com. 28800 IN A 93.184.216.34
Dec 19 12:40:38 admin: 2023/12/19 12:40:38 7216#168 [debug] dnsproxy: https://doh3.dns.nextdns.io:443/<redacted>/DK-ZEP-DoH3: response received over udp: "ok"
Dec 19 12:40:38 admin: 2023/12/19 12:40:38 7216#168 [debug] dnsproxy: upstream https://doh3.dns.nextdns.io:443/<redacted>/DK-ZEP-DoH3 successfully finished exchange of ;www.example.com. IN HTTPS; elapsed 301.3317ms
Dec 19 12:40:38 admin: 2023/12/19 12:40:38 7216#168 [debug] proxy: replying from upstream: rtt is 301.46636ms
Dec 19 12:40:38 admin: 2023/12/19 12:40:38 7216#168 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).logDNSMessage(): OUT: ;; opcode: QUERY, status: NOERROR, id: 6519
Dec 19 12:40:38 admin: ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
Example from a wired client. Looks so much "cleaner"...
Dec 19 13:07:20 jorgen: ;; QUESTION SECTION:
Dec 19 13:07:20 jorgen: ;www.bauhaus.se. IN A
Dec 19 13:07:20 jorgen: 2023/12/19 13:07:20 7216#1751 [debug] dot upstream: using existing conn 38.175.117.129:853
Dec 19 13:07:20 jorgen: 2023/12/19 13:07:20 7216#1751 [debug] dnsproxy: tls://DK-ZEP-DoT-<redacted>.dns.nextdns.io:853: sending request over tcp: A www.bauhaus.se.
Dec 19 13:07:20 jorgen: 2023/12/19 13:07:20 7216#1751 [debug] dnsproxy: tls://DK-ZEP-DoT-<redacted>.dns.nextdns.io:853: response received over tcp: "ok"
Dec 19 13:07:20 jorgen: 2023/12/19 13:07:20 7216#1751 [debug] dnsproxy: upstream tls://DK-ZEP-DoT-<redacted>.dns.nextdns.io:853 successfully finished exchange of ;www.bauhaus.se. IN A; elapsed 47.508116ms
Dec 19 13:07:20 jorgen: 2023/12/19 13:07:20 7216#1751 [debug] proxy: replying from upstream: rtt is 47.644438ms
Dec 19 13:07:20 jorgen: 2023/12/19 13:07:20 7216#1751 [debug] Override TTL from 300 to 600
Dec 19 13:07:20 jorgen: 2023/12/19 13:07:20 7216#1751 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).logDNSMessage(): OUT: ;; opcode: QUERY, status: NOERROR, id: 57446
Dec 19 13:07:20 jorgen: ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
Dec 19 13:07:20 jorgen: ;; QUESTION SECTION:
Dec 19 13:07:20 jorgen: ;www.bauhaus.se. IN A
Dec 19 13:07:20 jorgen: ;; ANSWER SECTION:
Dec 19 13:07:20 jorgen: www.bauhaus.se. 600 IN A 82.98.126.1
@iJorgen, well, it seems that there are actually no "duplicate" queries, all these similar requests have different types and are used for different purposes in DNS. AFAIK, both A
and AAAA
as well as HTTPS
are normally sent by the majority of modern browsers to get as much information as possible about the host. Are you sure this wasn't the case in earlier versions? Could you please try to capture such logs with v0.60.1
for example?
I rolled back to v0.60.1 and monitored the logs for 30 mins. Not a single duplicate... So strange!! The only thing changed is the DNSProxy executable on the router. Nothing else...
As mentioned it's only my Apple-devices that makes the duplicates. Also using Google/Nest speakers, Ubuntu, Windows and a bunch of IoT, but they don't generate any duplicates. It's also not just using a browser (Safari) that makes the duplicates. All requests from other apps are also duplicated. No new feature in the latest OS from Apple?! (17.2)
It's also never the same DNS-server that answers twice, so seems like after the first query it changes and makes another one.
Also tried to self-compile with Go 1.21.5, but no change.
Could it be on NextDNS side?!?! Maybe they don't handle this type of "Round-Robin" load balancing where queries are received on so many different servers/protocols/datacenters the same time and concatenate the logs?!
Anything else to try?!
Was thinking about this above:
it seems that there are actually no "duplicate" queries, all these similar requests have different types and are used for different purposes in DNS.
Dec 19 12:40:38 admin: ;; QUESTION SECTION:
Dec 19 12:40:38 admin: ;www.example.com. IN HTTPS
Dec 19 12:40:38 admin: 2023/12/19 12:40:38 7216#168 [debug] dnsproxy: https://doh3.dns.nextdns.io:443/<redacted>/DK-ZEP-DoH3: sending request over udp: HTTPS www.example.com.
Dec 19 12:40:38 admin: 2023/12/19 12:40:38 7216#159 [debug] dnsproxy: upstream quic://DK-ANX-DoQ-<redacted>.dns.nextdns.io:853 successfully finished exchange of ;www.example.com. IN A; elapsed 94.447778ms
If DNSProxy queries several types of records and v0.61.0 changes server for every request, maybe this is correct to see several entries in the log?! Seems v0.61.0 is eager to change source, before it's finished. With older releases it often used the same server for hours, so maybe the duplicates were not seen then?! (...but still strange it's only Apple-devices)
@iJorgen, sorry for not asking this at the beginning, but where do these "duplicates" appear? I mean, since dnsproxy now chooses the server randomly, probing server's RTT, it's competely possible to choose the same fastest server several times in a row for consequent requests.
No worries, @EugeneOne1 :-)
It's in the log on the NextDNS admin pages. I tag every upstream so I can see what location and protocol is being used. Here I can see the same DNS-query two times in a row, but using different location/protocol.
To summarize this case so far:
- If using any version of DNSProxy before v0.61.0, it's working as expected.
- No changes in the config between tests. Just changing the DNSProxy executable.
- Only Apple-devices (both wired and wireless) creates the duplicate queries in the log.
- DNSProxy always switches to another DNS-server before making the duplicated DNS-query.
Logically it must be some changes/bug in the latest release of DNSProxy how DNS-queries are made or the logic of switching servers (the new load balancing code) combined with how Apple-devices makes their queries.
I have found the issue and it was a deep one... ๐
All Apple-devices since 2020 makes two DNS-queries; one to the local resolver and one "Type 65" query to their own DNS-servers. My setup catches all DNS-traffic trying to bypass the router and redirecting it to 127.0.0.1:53, which gives the two queries every time.
With the new version of DNSProxy v0.61.0 these two queries gets visible in the DNS-logs since DNSProxy now switches to a new random server after each query. Before there were also two queries but were not shown in NextDNS logs, probably because they occurred to the same server/protocol within a few milliseconds. Now it's different datacenters and the logs probably don't sync together that fast, so it's counted as two separate DNS-queries.
My fix is to block all Type 65 queries in the firewall:
iptables -I INPUT -p udp --dport 53 -d $(nvram get lan_ipaddr) -m string --hex-string "|0000410001|" --algo bm -j REJECT
iptables -I FORWARD -p udp --dport 53 -m string --hex-string "|0000410001|" --algo bm -j REJECT
All is working perfect now and logs are clean again without duplicates.
Any chance that DNSProxy can handle/recognize/block "Type 65" queries in the future?!
@iJorgen, hello again. Resource records of type 65 (HTTPS) are completely useful in different DNS scenarios, like DDR. Consider using AdGuard Home for filtering such records more precisely.
For now, I suppose this issue may be closed.