fastmail/authentication_milter

Lots of DNS errors if no "dns_resolvers" specified

PHPGangsta opened this issue · 4 comments

Hello,

if I don't specify any dns_resolvers I get lots of DNS errors:

Oct 16 00:35:22 testmaschine01 authentication_milter[26552]: 3371A6C150D: ERROR: DNS Lookup MX domain.de error, hold set on domain.de : query timed out
Oct 16 00:35:22 testmaschine01 authentication_milter[26552]: 3371A6C150D: ERROR: DNS Lookup MX domain.de error, hold set on domain.de : query timed out
Oct 16 00:35:22 testmaschine01 authentication_milter[26552]: 3371A6C150D: ERROR: Lookup A domain.de aborted due to previous DNS Lookup timeout on domain.de
Oct 16 00:35:22 testmaschine01 authentication_milter[26552]: 3371A6C150D: ERROR: Lookup AAAA domain.de aborted due to previous DNS Lookup timeout on domain.de
Oct 16 00:35:23 testmaschine01 authentication_milter[26552]: 3371A6C150D: ERROR: Lookup TXT domain.de aborted due to previous DNS Lookup timeout on domain.de
Oct 16 00:35:23 testmaschine01 authentication_milter[26552]: 3371A6C150D: SPFHeader: Received-SPF: temperror (domain.de: Time-out on DNS 'TXT' lookup of 'domain.de') receiver=michael01.XXXXX.de; identity=pra; pra="mk@domain.de"; helo=out.domain.de; client-ip=11.22.33.44
Oct 16 00:35:23 testmaschine01 authentication_milter[26552]: 3371A6C150D: ERROR: Lookup TXT dkim._domainkey.domain.de aborted due to previous DNS Lookup timeout on domain.de
Oct 16 00:35:23 testmaschine01 authentication_milter[26552]: 3371A6C150D: DKIMResult: invalid (public key: DNS error: query timed out)
Oct 16 00:35:23 testmaschine01 authentication_milter[26552]: 3371A6C150D: DKIMSignatureResult: invalid (public key: DNS error: query timed out)
Oct 16 00:35:23 testmaschine01 authentication_milter[26552]: 3371A6C150D: ERROR: Lookup TXT _domainkey.domain.de aborted due to previous DNS Lookup timeout on domain.de
Oct 16 00:35:23 testmaschine01 authentication_milter[26552]: 3371A6C150D: ERROR: Lookup TXT _policy._domainkey.domain.de aborted due to previous DNS Lookup timeout on domain.de
Oct 16 00:35:23 testmaschine01 authentication_milter[26552]: 3371A6C150D: ERROR: Lookup TXT _adsp._domainkey.domain.de aborted due to previous DNS Lookup timeout on domain.de
Oct 16 00:35:23 testmaschine01 authentication_milter[26552]: 3371A6C150D: ERROR: Lookup MX domain.de aborted due to previous DNS Lookup timeout on domain.de
Oct 16 00:35:23 testmaschine01 authentication_milter[26552]: 3371A6C150D: ERROR: DKIM EOM Error DNS error: query timed out<LF>
Oct 16 00:35:23 testmaschine01 authentication_milter[26552]: 3371A6C150D: ERROR: Temp DKIM Error - DNS error: query timed out<LF>
Oct 16 00:35:23 testmaschine01 authentication_milter[26552]: 3371A6C150D: ERROR: Lookup MX domain.de aborted due to previous DNS Lookup timeout on domain.de
Oct 16 00:35:23 testmaschine01 authentication_milter[26552]: 3371A6C150D: ERROR: Lookup NS domain.de aborted due to previous DNS Lookup timeout on domain.de
Oct 16 00:35:23 testmaschine01 authentication_milter[26552]: 3371A6C150D: ERROR: Lookup A domain.de aborted due to previous DNS Lookup timeout on domain.de
Oct 16 00:35:23 testmaschine01 authentication_milter[26552]: 3371A6C150D: ERROR: Lookup AAAA domain.de aborted due to previous DNS Lookup timeout on domain.de

If I configure them, everything is fine:

"dns_resolvers"         : [
   "127.0.0.1",
   "8.8.8.8"
],

Which resolvers are being used if I don't specify dns_resolvers, I would have expected that the "operating system default resolvers" are being used...

OK, I still have problems if I define specific dns_resolvers:

"dns_timeout"           : 10,
"dns_retry"             : 3,
"dns_servfail_timeout"  : 2000000,
"dns_resolvers"         : [
   "9.9.9.9",
   "8.8.8.8"
],

In the logfiles (see attached) I see that I get "timed out" errors in less than 1 second. How is that possible, the timeout is defined as 10 seconds?

At "12:09:34" the connection is coming in, and at "12:09:34" I also get "query timed out" errors:

Oct 17 12:09:34 testmaschine01 authentication_milter[18624]: 081846C14AD: ERROR: DNS Lookup TXT urlaub.fti.de error, hold set on fti.de : query timed out
Oct 17 12:09:34 testmaschine01 authentication_milter[18624]: 081846C14AD: Metrics: Counting senderid_total:1:ident=authentication_milter,result=temperror
Oct 17 12:09:34 testmaschine01 authentication_milter[18624]: 081846C14AD: SenderIdCode: temperror

See attached logfile. I guess something is wrong there? How is it possible that it timed out in less than a second?

If I try to resolve the query manually with dig it works fine. rspamd is also running on that maschine and has no DNS issues...

Another question:
ERROR: DNS Lookup PTR 93.191.164.99 error, hold set on 99 : query timed out
What does it mean "hold set on 99"? Should there be the full IP address instead of just "99"?

auth_milter_log.txt

When not specified the default behavior of Net::DNS is used, this should be to use the default system resolvers, however exactly how that works can very per system.

The hold should be set such that related lookups are grouped together, avoiding many lookups to servers that are failing. Given this is a PTR lookup I suspect the PSL lookup is returning 99 incorrectly. I'll open a bug about that.

What version of Net::DNS::Resolver are you running?

The hold should be set such that related lookups are grouped together, avoiding many lookups to servers that are failing. Given this is a PTR lookup I suspect the PSL lookup is returning 99 incorrectly. I'll open a bug about that.

Thanks!

What version of Net::DNS::Resolver are you running?

Let me see.

For debugging I installed authentication_milter on a second machine 2 days ago. On that second machine DNS seems to work fine (only these strange "ERROR: DNS Lookup PTR 211.91.58.92 error, hold set on 92 : SERVFAIL" messages sometimes).

On the machine with the DNS problems I get this:

cpan -D Net::DNS::Resolver
....
Net::DNS::Resolver
-------------------------------------------------------------------------
        (no description)
        N/NL/NLNETLABS/Net-DNS-1.32.tar.gz
        /usr/share/perl5/Net/DNS/Resolver.pm
        Installed: 1564
        CPAN:      1818  Not up to date
        NLnet Labs (NLNETLABS)
        cpan@nlnetlabs.nl

On the machine where I don't have problems I get this:

cpan -D Net::DNS::Resolver
....
Net::DNS::Resolver
-------------------------------------------------------------------------
        (no description)
        N/NL/NLNETLABS/Net-DNS-1.32.tar.gz
        /usr/local/share/perl/5.30.0/Net/DNS/Resolver.pm
        Installed: 1818
        CPAN:      1818  up to date
        NLnet Labs (NLNETLABS)
        cpan@nlnetlabs.nl

Maybe it can be fixed by upgrading Net::DNS::Resolver from 1564 to 1818 on the problematic machine.... I just tried cpan-outdated -p | cpanm but it did not update Net::DNS::Resolver 😠

I then tried cpan> upgrade Net::DNS::Resolver and now it's updated to 1818... I restarted authentication_milter, let's see if the DNS errors are gone now.

After updating Net::DNS::Resolver everything is fine now, no unexpected DNS errors/timeouts anymore. Thanks for your help!