hashicorp/consul

1.8.2/1.8.4 shows expired cert for Consul, which is not the case (certs via LetsEncrypt)

linuxmail opened this issue · 5 comments

Nomad version

Output from nomad version

Nomad v1.8.4

Operating system and Environment details

  • Debian Bullseye

We have a Sandbox stack:

  • NodeServer1

    • Nomad Server
    • Consul Server
    • Vault Server
  • NodeAgent1

    • Nomad agent
    • Consul agent

The Certs will be generated by a Cron Job via LetsEncrypt and used by all Hashi daemons. It worked well for the last ~1year with Version Nomad 1.6.9 (and below) and on August I've replaced them with v1.8.2. The Cron restarts also all three components (Vault/Consul/Nomad).

Issue

On 3. Oct I had the strange issue, that Nomad 1.8.2 was not able to connect anymore to Consul and it showed:

Oct  3 13:49:24 hashi-agent-02 nomad[5571]:     2024-10-03T13:49:24.147+0200 [ERROR] client: error discovering nomad servers: error="client.consul: unable to query Consul datacenters: Get \"https://hashi-agent-02.sandbox.work:8501/v1/catalog/datacenters\": tls: failed to verify certificate: x509: certificate has expired or is not yet valid: current time 2024-10-03T13:49:24+02:00 is after 2024-09-30T18:14:03Z"

Also after a restart from Consul / Nomad .. even the whole VM show the same message. But if I check with a Curl to the same URL https://hashi-agent-02.sandbox.work:8501), it showed the correct and valid cert.

So I I replaced the Nomad v1.8.2 with v1.8.4 but still the same issue. So the last choice was, to disable verify_ssl on Nomad Agent Config and then .. Nomad was fine again.

Even OpenSSL is happy:

hashi-agent-03:[~]: openssl s_client -quiet -connect hashi-agent-03.sandbox.work:8501 -CAfile /etc/ssl/acme/ca.pem
depth=2 C = US, O = Internet Security Research Group, CN = ISRG Root X1
verify error:num=2:unable to get issuer certificate
issuer= O = Digital Signature Trust Co., CN = DST Root CA X3
verify return:1
depth=1 C = US, O = Let's Encrypt, CN = E6
issuer= C = US, O = Internet Security Research Group, CN = ISRG Root X1
verify return:1
depth=0 CN = *.sandbox.work
issuer= C = US, O = Let's Encrypt, CN = E6
verify return:1

HTTP/1.1 400 Bad Request
Content-Type: text/plain; charset=utf-8
Connection: close

400 Bad Request
  • Nomad Consul config:
  "consul": {
    "address": hashi-agent-03.sandbox.work:8501",
    "server_service_name": "nomad",
    "client_service_name": "nomad-client",
    "auto_advertise": true,
    "verify_ssl": true,
    "checks_use_advertise": true,
    "server_auto_join": true,
    "client_auto_join": true,
    "ssl": true,
    "ca_file": "/etc/ssl/acme/ca.pem",
    "cert_file": "/etc/ssl/acme/_.sandbox.work.crt",
    "key_file": "/etc/ssl/acme/_.sandbox.work.key",
    "token": "<redacted>"
  },

Also .. as I said, I've rebooted the whole stack (server / agent) and still has the expired cert message.

Nomad Client logs (if appropriate)

Oct  4 09:57:49 hashi-agent-03 nomad[3511999]:     2024-10-04T09:57:49.030+0200 [WARN]  agent: (view) kv.block(nomad/client_portal_api/environment): Get "https://hashi-agent-03.sandbox.work:8501/v1/kv/nomad/client_portal_api/environment?stale=&wait=300000ms": tls: failed to verify certificate: x509: certificate has expired or is not yet valid: current time 2024-10-04T09:57:49+02:00 is after 2024-09-30T18:14:03Z (retry attempt 6 after "8s")

root@hashi-agent-03:[~]: curl -v  https://hashi-agent-03.sandbox.work:8501/v1/kv/nomad/tms_service/environment?stale=&wait=300000ms
[1] 3515019
root@hashi-agent-03:[~]: *   Trying 10.4.1.23:8501...
* Connected to hashi-agent-03.sandbox.work (10.4.1.23) port 8501 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=*.sandbox.work
*  start date: Sep 29 21:03:00 2024 GMT
*  expire date: Dec 28 21:02:59 2024 GMT
*  subjectAltName: host "hashi-agent-03.work" matched cert's "*.sandbox.work"
*  issuer: C=US; O=Let's Encrypt; CN=E6
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x5557cdae5620)
> GET /v1/kv/nomad/tms_service/environment?stale= HTTP/2
> Host: hashi-agent-03.sandbox.work:8501
> user-agent: curl/7.74.0
> accept: */*
> 
...

I have no idea, where to search.

LMAO
why do you use letsencrypt for TLS internals
it's strictly denied in docs due to possibility of unauthorized access
drop it and use nomad tls

Hi, thanks for this hint and I know it already. But it does not explain my issue.

Hi @linuxmail what you're showing looks roughly ok to me but maybe something has changed in the Consul API client. I'm going to move this issue to the Consul repo where I think you'll get more specific help for Consul.

Hi @linuxmail - I found this issue based on 'not after' field. I can't help with consul or nomad.

However, a hint is the 'not after' expiry date of 2024-09-30T18:14:03Z. It matches an old and retired root certificate "ISRG Root X1" (which there is a newer root cert with the same name).

There is mention of this here: https://community.letsencrypt.org/t/isrg-root-x1-expiring-please-confirm-the-latest-cert/225845/3, and the retired ISRG Root X1 cross-signed by DST Root CA X3 cert can be found on https://letsencrypt.org/certificates/

Hopefully this hint points you in the direction of your TLS client, and the root certificates it has available.