sensson/puppet-powerdns

pdns_record keeps looping

Closed this issue · 19 comments

My backend is sqlite3.

I have this entry to create a zone:

powerdns_zone { 'dumb.zone':; }

And I tried all of these, with the same result: it keeps adding the record again and again. although the record is present and it's working.

powerdns_record { 'dumb-record':
  target_zone => 'dumb.zone',
  rname       => 'dumb-record',
  rtype       => 'A',
  rcontent    => '192.168.1.10',
}
powerdns_record { 'dumb-record.dumb.zone':
  rname    => 'dumb-record',
  rtype    => 'A',
  rcontent => '192.168.1.10',
}
powerdns_record { 'dumb-record.dumb.zone':
  rcontent => '192.168.1.10',
}

Setting manage_record to false works, but I don't know if this is the intended:

powerdns_zone { 'dumb.zone':
  manage_records => false,
}

am I doing something wrong?

ju5t commented

@trefzer do you know why this can happen?

We observed something similar.
When listing the Zone, pdnsutil printed some extra output like
Apr 24 15:08:33 gsqlite3: connection to '/var/lib/powerdns/powerdns.sqlite3' successful
in every run. The date and time were different every time.
We fixed it by lowering/deleting the loglevel setting in /etc/pdns/pdns.conf, which was set to 6 before.

I've had similar issues with manage_records => true and have pushed a fix for it at #171. What I was seeing on dumping the output of should_content was that content.push("$ORIGIN .\n"), presumably so written to ensure a \n at the end of the content, was then being sorted on the next line, resulting in:

$ORIGIN .

<begin records>

which never matched the output of pdnsutil, which instead produced output like:

$ORIGIN .
<begin records>

After applying this change locally, I'm no longer seeing any zones re-imported unless their records actually differ.

Okay now I can't trigger the bug with the old code, so something else has changed.

In your case it's more likely that there is some log output or something else extraneous in the pdnsutil output; while I don't think my change is actually incorrect, I don't think the extraneous \n is causing any problems. I suspect I fixed a few things at once and \n was never actually causing a re-import.

ju5t commented

I'm curious if this is related to a PowerDNS change or if it's an actual bug in the code that has never worked in the past.

We're not using it ourselves so it's difficult for me to verify this.

Okay, so, step back and some clarity on the more generalised issue:

We're currently running PowerDNS 4.8.3 with the PostgreSQL backend.

Examining the PowerDNS database, it appears that PowerDNS never stores records internally with a trailing ., for record types that expect one (PTR, CNAME etc). When queried as normal, these records are returned with the trailing ., but they are stored without it, whether they're imported with it or not.

powerdns=# select * from records where content like '%bck%';
    id    | domain_id |                         name                         | type  |           content            | ttl | prio | disabled | ordername | auth 
----------+-----------+------------------------------------------------------+-------+------------------------------+-----+------+----------+-----------+------
 21551016 |         2 | bck01.lab.example.com                                | CNAME | vl7.bck01.lab.example.com    | 300 |    0 | f        |           | t
 21551826 |        13 | 8.0.30.172.in-addr.arpa                              | PTR   | bck01.lab.example.com        | 300 |    0 | f        |           | t
(2 rows)

When fetching zones with pdnsutil list-zone, PowerDNS is inconsistent about whether the trailing . is added or not. For example:

root@dns01:~# pdnsutil list-zone lab.example.com | grep bck
bck01.lab.example.com        300     IN      CNAME   vl7.bck01.lab.example.com.
vl7.bck01.lab.example.com    300     IN      A       172.30.0.8

The CNAME record has a trailing ., whereas:

root@dns01:~# pdnsutil list-zone 0.30.172.in-addr.arpa | grep bck
8.0.30.172.in-addr.arpa 300     IN      PTR     bck01.lab.example.com

The PTR record does not.

The same confusion affects SOA records:

root@dns21:~# dig -t SOA lab.example.com +short
dns.lab.example.com. admin.example.com. 22973 10800 3600 604800 3600
root@dns20:~# pdnsutil list-zone lab.example.com | grep SOA
lab.example.com      3600    IN      SOA     dns.lab.example.com admin.example.com 22973 10800 3600 604800 3600

Though the record is correctly queried with trailing . characters on the mname and rname values, they're missing from the database entry and the pdnsutil list-zone output.

As a separate issue, PowerDNS records that are imported with uppercase characters in the name field are automatically downcased before being saved into the database, and thus always returned as lowercase by pdnsutil list-zone, even if they were uppercase in the actual powerdns_record resource.

Since pdnsutil list-zone is the means by which the provider determines whether it needs to update the resource, this means that powerdns_zone and powerdns_record resources need to be written with respect to these quirks to ensure that should_content behaves correctly. Thus:

  • All powerdns_zone resources must omit the trailing . in the soa_mname and soa_rname parameters.
  • All powerdns_record resources must define the rname parameter in lowercase.
  • CNAME powerdns_record resources must include a trailing . in the rcontent parameter.
  • PTR powerdns_record resources must omit the trailing . in the rcontent parameter.

Any resource that fails these rules will still be accepted by PowerDNS and will still work just fine, but any affected zones will be marked as requiring correction by Puppet on every run, because the pdnsutil list-zone output will never match the resource parameters.

Since PowerDNS' behaviour is inconsistent across different record types, it doesn't necessarily feel correct to enforce these rules in the resources themselves, but failing to be aware of them means that people are going to end up with lots of endlessly repeated resources that change nothing but the zone serial.


The other thing that can happen, as mentioned earlier, is that setting loglevel above 5 in PowerDNS causes database connection messages to be written to STDERR on pdnsutil list-zone.

Unfortunately, the command execution context defined by the commands setter in provider.rb forces :combine => true, which combines STDOUT and STDERR in the command output and includes these database log lines in the output of pdnsutil(pdnsutil_options, 'list-zone', resource[:name]). Fixing this at the provider level would probably require overriding Puppet::Provider::CommandDefiner, and I don't know off the top of my head how easy (or possible) that's going to be, so it's vital when using this module that PowerDNS loglevel be kept at 5 or below.

I don't know if the behaviour of pdnsutil list-zones is different when using other backends (SQLite etc), but this is the behaviour I've observed with PostgreSQL.

https://doc.powerdns.com/md/types/ says:

Warning: Host names and the MNAME of a SOA records are NEVER terminated with a '.' in PowerDNS storage! If a trailing '.' is present it will inevitably cause problems, problems that may be hard to debug. Use pdnsutil check-zone to validate your zone data.

From the PowerDNS source:

static int listZone(const DNSName &zone) {
  UeberBackend B;
  DomainInfo di;

  if (! B.getDomainInfo(zone, di)) {
    cerr << "Zone '" << zone << "' not found!" << endl;
    return EXIT_FAILURE;
  }
  di.backend->list(zone, di.id);
  DNSResourceRecord rr;
  cout<<"$ORIGIN ."<<endl;
  cout.sync_with_stdio(false);

  while(di.backend->get(rr)) {
    if(rr.qtype.getCode()) {
      if ( (rr.qtype.getCode() == QType::NS || rr.qtype.getCode() == QType::SRV || rr.qtype.getCode() == QType::MX || rr.qtype.getCode() == QType::CNAME) && !rr.content.empty() && rr.content[rr.content.size()-1] != '.')
	rr.content.append(1, '.');

      cout<<rr.qname<<"\t"<<rr.ttl<<"\tIN\t"<<rr.qtype.toString()<<"\t"<<rr.content<<"\n";
    }
  }
  cout.flush();
  return EXIT_SUCCESS;
}

Which suggests that the record types that require a trailing . in the rcontent parameter are NS, SRV, MX and CNAME. Quite why pdnsutil list-zone chooses to apply this logic is left as an exercise to the reader. This means that it may be preferable to force the absence of a trailing . in the rcontent parameter for most record types, and then add it to the record when testing against pdnsutil list-zones for these four record types, which would enforce consistency with PowerDNS behaviour in this case.

ju5t commented

This means that it may be preferable to force the absence of a trailing . in the rcontent parameter for most record types, and then add it to the record when testing against pdnsutil list-zones for these four record types, which would enforce consistency with PowerDNS behaviour in this case.

@Longsight first of all, thanks for the thorough explanation! I agree this would be the best way forward for now.

I'm not sure yet if this will solve the issue @maxadamo is facing as he had an issue with A-records, but it's definitely something that needs fixing. Considering the amount of time you already spent on this I'm a little hesitant to ask, but is it something you could PR you think?

Ps. I can confirm the behaviour of pdnsutil is exactly the same with a MySQL database.

I'll refresh my memory of the type/provider API and have a look at this tonight, yeah.

I'm not sure yet if this will solve the issue @maxadamo is facing as he had an issue with A-records

Without further investigation I can't say for sure, but if @maxadamo has loglevel set to 6 or more then they will 100%
be experiencing the issue described. The most sensible thing would be to cull log lines from the command output in def content in provider/powerdns_zone_private/pdnsutil.rb; I'll include that in the PR.

I'm not sure yet if this will solve the issue @maxadamo is facing as he had an issue with A-records

Without further investigation I can't say for sure, but if @maxadamo has loglevel set to 6 or more then they will 100% be experiencing the issue described.

@Longsight I'm using the default loglevel. I'm not understanding what is the default, but the log files are pretty much empty.
Only few lines are being written to logs upon service restart.

@maxadamo what do you get when you run pdnsutil list-zone dumb.zone? Anything unexpected in the output?

Essentially this is a question of comparing the output of that command with the expected output that the provider builds from the resources - if there are unexpected lines in the output, then it'll never match and always refresh the resource.

@Longsight there you go. Why do I see bindbackend?

# pdnsutil list-zone dumb.zone
Apr 26 17:24:29 [bindbackend] Done parsing domains, 0 rejected, 0 new, 0 removed
$ORIGIN .
dumb-record.dumb.zone	3600	IN	A	192.168.1.10
dumb.zone	3600	IN	SOA	a.powerdns.server hostmaster 4 10800 3600 604800 3600

but, why do I see bindbackend, if I declared sqlite only?

  class { 'powerdns':
    recursor_version => $pdns_major_version,
    recursor         => true,
    backend          => 'sqlite';
  }

for me it was solved removing the package pdns-backend-bind
At some point this package got installed, and I don't know why. Perhaps I initially declared the class without specifying sqlite, but it seems to work, and it's not being reinstalled by the module.
For the sake of sharing, pdns_record now works and this is the output of the tool:

# pdnsutil list-zone dumb.zone
$ORIGIN .
dumb-record.dumb.zone	3600	IN	A	192.168.1.10
dumb.zone	3600	IN	SOA	a.powerdns.server hostmaster 4 10800 3600 604800 3600

would it be an option to uninstall backends that have not been declared in the module?

It's automatically purged if the chosen backend is postgresql or ldap, it should probably be purged for all other backends as well since it's pulled in as a recommended package by pdns-server on Debian.

The default backend in the module is mysql so if you don't specify one at all it won't default to bind.

stale commented

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. We get that this might not be what you had expected but we're trying to limit stale as issues as much as possible. Thank you for your contributions.