Error: "solving challenges: waiting for solver certmagic.solverWrapper to be ready: timed out waiting for record to fully propagate; verify DNS provider configuration is correct - last error: <nil>"
polarathene opened this issue · 1 comments
First time trying out the DNS provider feature. Tried with Caddyfile as below with scoped API token passed in as ENV var.
- Waited a while with nothing happening until an error occurred.
- Quick glance through existing issues, doesn't appear to be an earlier issue about it.
- While raising this issue, the third attempt was successful, taking about 10-30 minutes to be successful (reproduced multiple times).
Is this expected behaviour of the feature with Cloudflare, or is the time taken and earlier failures a possible problem?
Environment Details:
- The apex domain (
example.com
, no subdomain) has a reverse proxy (Traefik) that runs on a different server handling several sub-domains and a CNAME wildcard for non-explicit subdomains. - The Caddy instance is a fresh 2.4 binary from the download page with the Cloudflare module added, run via a fresh instance of Fedora 34 on Vultr.
- Unlike when running Caddy via Docker, Fedora additionally required running a firewall command to allow HTTP/HTTPS traffic. Prior to trying the Cloudflare DNS module, a different subdomain was already configured on Cloudflare to the Vultr Fedora VPS IP and regular LetsEncrypt which worked successfully.
Caddyfile:
{
acme_ca https://acme-staging-v02.api.letsencrypt.org/directory
email admin@example.com
}
test.example.com {
tls {
dns cloudflare {env.CF_API_TOKEN}
}
root * /srv/site
file_server
}
Caddy stdout log:
2021/05/22 06:18:10.447 INFO serving initial configuration
2021/05/22 06:18:10.433 INFO tls cleaning storage unit {"description": "FileStorage:/root/.local/share/caddy"}
2021/05/22 06:18:10.430 INFO tls.cache.maintenance started background certificate maintenance {"cache": "0xc000260460"}
2021/05/22 06:18:10.449 INFO tls.obtain acquiring lock {"identifier": "test.example.com"}
2021/05/22 06:18:10.451 INFO tls.obtain lock acquired {"identifier": "test.example.com"}
2021/05/22 06:18:10.452 INFO tls.issuance.acme waiting on internal rate limiter {"identifiers": ["test.example.com"]}
2021/05/22 06:18:10.452 INFO tls.issuance.acme done waiting on internal rate limiter {"identifiers": ["test.example.com"]}
2021/05/22 06:18:10.454 INFO tls finished cleaning storage units
2021/05/22 06:18:12.092 INFO tls.issuance.acme.acme_client trying to solve challenge {"identifier": "test.example.com", "challenge_type": "dns-01", "ca": "https://acme-staging-v02.api.letsencrypt.org/directory"}
2021/05/22 06:20:15.996 INFO tls.issuance.acme waiting on internal rate limiter {"identifiers": ["test.example.com"]}
2021/05/22 06:20:15.997 INFO tls.issuance.acme done waiting on internal rate limiter {"identifiers": ["test.example.com"]}
2021/05/22 06:20:17.104 INFO tls.issuance.acme.acme_client trying to solve challenge {"identifier": "test.example.com", "challenge_type": "dns-01", "ca": "https://acme-staging-v02.api.letsencrypt.org/directory"}
2021/05/22 06:22:19.301 ERROR tls.obtain will retry {"error": "[test.example.com] Obtain: [test.example.com] solving challenges: waiting for solver certmagic.solverWrapper to be ready: timed out waiting for record to fully propagate; verify DNS provider configuration is correct - last error: <nil> (order=https://acme-staging-v02.api.letsencrypt.org/acme/order/19618631/59618995) (ca=https://acme-staging-v02.api.letsencrypt.org/directory)", "attempt": 1, "retrying_in": 60, "elapsed": 248.850234978, "max_duration": 2592000}
2021/05/22 06:23:20.908 INFO tls.issuance.acme.acme_client trying to solve challenge {"identifier": "test.example.com", "challenge_type": "dns-01", "ca": "https://acme-staging-v02.api.letsencrypt.org/directory"}
2021/05/22 06:25:24.293 INFO tls.issuance.acme.acme_client trying to solve challenge {"identifier": "test.example.com", "challenge_type": "dns-01", "ca": "https://acme-staging-v02.api.letsencrypt.org/directory"}
2021/05/22 06:27:25.852 ERROR tls.obtain will retry {"error": "[test.example.com] Obtain: [test.example.com] solving challenges: waiting for solver certmagic.solverWrapper to be ready: timed out waiting for record to fully propagate; verify DNS provider configuration is correct - last error: <nil> (order=https://acme-staging-v02.api.letsencrypt.org/acme/order/19618631/59621711) (ca=https://acme-staging-v02.api.letsencrypt.org/directory)", "attempt": 2, "retrying_in": 120, "elapsed": 555.400860953, "max_duration": 2592000}
2021/05/22 06:29:28.604 INFO tls.issuance.acme.acme_client trying to solve challenge {"identifier": "test.example.com", "challenge_type": "dns-01", "ca": "https://acme-staging-v02.api.letsencrypt.org/directory"}
2021/05/22 06:29:31.836 INFO tls.issuance.acme.acme_client validations succeeded; finalizing order {"order": "https://acme-staging-v02.api.letsencrypt.org/acme/order/19618631/59623852"}
2021/05/22 06:29:32.512 INFO tls.issuance.acme.acme_client successfully downloaded available certificate chains {"count": 1, "first_url": "https://acme-staging-v02.api.letsencrypt.org/acme/cert/fa59c8fe71cfcb679672d9cb9c6fda649b27"}
2021/05/22 06:29:32.514 INFO tls.obtain certificate obtained successfully {"identifier": "test.example.com"}
2021/05/22 06:29:32.514 INFO tls.obtain releasing lock {"identifier": "test.example.com"}
TL;DR
Is usage of this module expected to fail multiple times in a valid setup and take 10 minutes to be successful?
If not, could the cause be related to :
- Server setup such as Fedora default security policies interfering?
- Existing DNS configuration? (multiple servers for a given domain via A records on specific subdomains, and a
*
CNAME record to the A record of the apex domainexample.com
)
Changed wildcard CNAME record to point to an A record for the server running caddy, no improvement. Applies to both staging and production ACME configs. Behaviour is reproducible.
timed out waiting for record to fully propagate; verify DNS provider configuration is correct - last error: <nil>
- I looked at the Cloudflare dashboard and can see the
_acme-challenge.test
TXT record, this value changes each time after an error retry (60 seconds). - I have queried the challenge TXT record via
https://dnschecker.org/#TXT/
which shows success on all checks around the world while Caddy fails to query the record successfully. - On my local system querying the DNS record also works.
Investigated the DNS query from within the Vultr VPS:
dig:
dig TXT _acme-challenge.test.example.com
; <<>> DiG 9.16.11-RedHat-9.16.11-5.fc34 <<>> TXT _acme-challenge.test.example.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 31271
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;_acme-challenge.test.example.com. IN TXT
;; ANSWER SECTION:
_acme-challenge.test.example.com. 269 IN CNAME vultr-vps.example.com.
;; AUTHORITY SECTION:
example.com. 269 IN SOA mia.ns.cloudflare.com. dns.cloudflare.com. 2037333583 10000 2400 604800 3600
example.com. 1436 IN SOA mia.ns.cloudflare.com. dns.cloudflare.com. 2037333381 10000 2400 604800 3600
;; Query time: 0 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Sat May 22 07:44:29 UTC 2021
;; MSG SIZE rcvd: 174
$ dog _acme-challenge.test.example.com TXT
CNAME _acme-challenge.test.example.com. 3m39s "vultr-vps.example.com."
SOA example.com. 3m39s A "mia.ns.cloudflare.com." "dns.cloudflare.com." 2037333631 2h46m40s 40m00s 7d0h00m00s 1h00m00s
$ dog _acme-challenge.test.example.com TXT @1.1.1.1
TXT _acme-challenge.test.example.com. 5m00s "sPxh-D3aWfGismO-Y4w9iRzhDyb5btdPH126rDYczNM"
systemd resolvectl: (appears to be what the VPS was using for managing DNS)
$ cat /etc/resolv.conf | grep nameserver
nameserver 127.0.0.53
$ resolvectl dns
Global:
Link 2 (ens3): 108.61.10.10
Link 3 (docker0):
$ resolvectl --cache no -t TXT query _acme-challenge.test.example.com
_acme-challenge.test.example.com: resolve call failed: Name 'vultr-vps.example.com' does not have any RR of the requested type
$ resolvectl flush-caches
$ resolvectl --cache no -t TXT query _acme-challenge.test.example.com
_acme-challenge.test.example.com IN TXT "iUXEbzj8ZFKRMy-lsPIm-BzKk2NstsykMzVp6G_xTDo" -- link: ens3
-- Information acquired via protocol DNS in 1.6ms.
-- Data is authenticated: no; Data was acquired via local or encrypted transport: no
-- Data from: network
- Querying with a different DNS resolver than whatever Vultr was presumably providing on the
ens3
ethernet connection seemed to work well. resolvectl flush-caches
didn't seem to have an effect.resolvectl --cache no
sometimes failed providing a CNAME record (the wildcard CNAME record on Cloudflare pointed an A recordvultr-vps.example.com
that was reported as a CNAME) instead of the TXT record that was present/available.- When
resolvectl
was successful, Caddy would follow shortly after in challenge success, this was always the case. - Configuring Caddyfile to use a different DNS resolver always worked and is the preferred solution.
Solution
If having DNS issues, and examples above for querying the DNS TXT record for _acme-challenge
(which is present on your Cloudflare DNS dashboard) are not working without a custom resolver (like with dog
query); then include resolvers 1.1.1.1
to use Cloudflare DNS resolver in the tls
directive:
Caddyfile:
{
acme_ca https://acme-staging-v02.api.letsencrypt.org/directory
email admin@example.com
}
test.example.com {
tls {
dns cloudflare {env.CF_API_TOKEN}
resolvers 1.1.1.1
}
root * /srv/site
file_server
}
- This solution was mentioned in a prior issue: #13 (comment)
- Possible behaviour of early failure with eventual success (be that 10-30 minutes or more due to incremental backoff time on retries) is described here: cert-manager/cert-manager#1035 (comment)