traefik/traefik

raft consensus not working as expected during acme cert generation

Closed this issue · 2 comments

Do you want to request a feature or report a bug?

Bug

Did you try using a 1.7.x configuration for the version 2.0?

  • Yes
  • No

What did you do?

Run Traefik as ingress controller (daemonset) in kubernetes with etcd as a kv-storage backend

What did you expect to see?

Certificates being generated with LE using http challenge properly by a single instance as described in https://docs.traefik.io/user-guide/cluster/#traefik-cluster-and-lets-encrypt

What did you see instead?

Traefik is not consistently using the same instance to solve the challenge(s)

Version

Version:      v1.7.11
Codename:     maroilles
Go version:   go1.11.9
Built:        2019-04-26_08:42:33AM
OS/Arch:      linux/amd64

Configuration

debug = false
checkNewVersion = false
keepTrailingSlash = true
logLevel = "INFO"
defaultEntrypoints = ["http", "https"]

[traefikLog]
format = "json"

[accessLog]
format = "json"

[entryPoints]

  [entryPoints.http]
  address = ":80"
  compress = true

    [entryPoints.http.forwardedHeaders]
    trustedIPs = ["127.0.0.1/32", "192.168.0.0/16", "127.16.0.0/12", "10.0.0.0/8", "130.211.0.0/22", "35.191.0.0/16"]

  [entryPoints.https]
  address = ":443"
  compress = true

    [entryPoints.https.forwardedHeaders]
    trustedIPs = ["127.0.0.1/32", "192.168.0.0/16", "127.16.0.0/12", "10.0.0.0/8", "130.211.0.0/22", "35.191.0.0/16"]

    [entryPoints.https.tls]
    minVersion = "VersionTLS11"
    cipherSuites = [
      "TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384",
      "TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384",
      "TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305",
      "TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305",
      "TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256",
      "TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256",
      "TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256",
      "TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256",
      "TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA",
      "TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA",
      "TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA",
      "TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA"
    ]

[kubernetes]

[api]

[etcd]
watch = true
endpoint = "etcd:2379"
prefix = "/traefik"
useAPIV3 = true

[acme]
email = "xxx@xxx.com"
storage = "traefik/acme/account"
entryPoint = "https"
onHostRule = true
acmeLogging = true

  [acme.httpChallenge]
  entryPoint = "http"

Relevant log output

success:

traefik-vjx72 traefik {"level":"info","msg":"legolog: [INFO] [domain.tld] acme: Obtaining bundled SAN certificate","time":"2019-05-07T08:32:00Z"}
traefik-vjx72 traefik {"level":"info","msg":"legolog: [INFO] [domain.tld] AuthURL: https://acme-v02.api.letsencrypt.org/acme/authz/K6DJ2NPcLp6hYufBXLIKonJJ9w0GDulcIe0DKmtsxCg","time":"2019-05-07T08:32:01Z"}
traefik-vjx72 traefik {"level":"info","msg":"legolog: [INFO] [domain.tld] acme: Could not find solver for: tls-alpn-01","time":"2019-05-07T08:32:01Z"}
traefik-vjx72 traefik {"level":"info","msg":"legolog: [INFO] [domain.tld] acme: use http-01 solver","time":"2019-05-07T08:32:01Z"}
traefik-vjx72 traefik {"level":"info","msg":"legolog: [INFO] [domain.tld] acme: Trying to solve HTTP-01","time":"2019-05-07T08:32:01Z"}
traefik-vjx72 traefik {"level":"info","msg":"legolog: [INFO] [domain.tld] The server validated our request","time":"2019-05-07T08:32:08Z"}
traefik-vjx72 traefik {"level":"info","msg":"legolog: [INFO] [domain.tld] acme: Validations succeeded; requesting certificates","time":"2019-05-07T08:32:08Z"}
traefik-vjx72 traefik {"level":"info","msg":"legolog: [INFO] [domain.tld] Server responded with a certificate.","time":"2019-05-07T08:32:11Z"}
traefik-vjx72 traefik {"level":"error","msg":"Datastore sync error: object lock value: expected 5f511b6b-4c15-479a-9e39-6c03465ebcde, got 5479c7f0-ad68-4dbf-be56-269d08a6d26c, retrying in 380.160476ms","time":"2019-05-07T08:32:11Z"}

failure:

traefik-vjx72 traefik {"level":"info","msg":"legolog: [INFO] [anotherdomain.tld] acme: Obtaining bundled SAN certificate","time":"2019-05-07T08:33:53Z"}
traefik-vjx72 traefik {"level":"info","msg":"legolog: [INFO] [anotherdomain.tld] AuthURL: https://acme-v02.api.letsencrypt.org/acme/authz/FoAp7917DAa0H36n65QKqD7ZuosFLcNFFOJdk7qKcUg","time":"2019-05-07T08:33:54Z"}
traefik-vjx72 traefik {"level":"info","msg":"legolog: [INFO] [anotherdomain.tld] acme: Could not find solver for: tls-alpn-01","time":"2019-05-07T08:33:54Z"}
traefik-vjx72 traefik {"level":"info","msg":"legolog: [INFO] [anotherdomain.tld] acme: use http-01 solver","time":"2019-05-07T08:33:54Z"}
traefik-vjx72 traefik {"level":"info","msg":"legolog: [INFO] [anotherdomain.tld] acme: Trying to solve HTTP-01","time":"2019-05-07T08:33:54Z"}
traefik-vjx72 traefik {"level":"error","msg":"Datastore sync error: object lock value: expected 1f84a220-089e-4854-87e2-f9006a6ca4fb, got 713f4e2f-6065-4c22-b3b3-656fba7e7b57, retrying in 603.006298ms","time":"2019-05-07T08:33:54Z"}
traefik-pcxcv traefik {"level":"error","msg":"Error getting challenge for token retrying in 355.982358ms","time":"2019-05-07T08:33:56Z"}
traefik-pcxcv traefik {"level":"error","msg":"Error getting challenge for token retrying in 530.613849ms","time":"2019-05-07T08:33:56Z"}
traefik-pcxcv traefik {"level":"error","msg":"Error getting challenge for token retrying in 1.43469375s","time":"2019-05-07T08:33:57Z"}
traefik-pcxcv traefik {"level":"error","msg":"Error getting challenge for token retrying in 1.881825721s","time":"2019-05-07T08:33:58Z"}
traefik-pcxcv traefik {"level":"error","msg":"Error getting challenge for token retrying in 1.528207478s","time":"2019-05-07T08:34:00Z"}
traefik-pcxcv traefik {"level":"error","msg":"Error getting challenge for token retrying in 1.924772981s","time":"2019-05-07T08:34:02Z"}
traefik-pcxcv traefik {"level":"error","msg":"Error getting challenge for token retrying in 5.214508065s","time":"2019-05-07T08:34:04Z"}
traefik-pcxcv traefik {"level":"error","msg":"Error getting challenge for token retrying in 6.400310981s","time":"2019-05-07T08:34:09Z"}
traefik-pcxcv traefik {"level":"error","msg":"Error getting challenge for token retrying in 9.171718661s","time":"2019-05-07T08:34:15Z"}
traefik-pcxcv traefik {"level":"error","msg":"Error getting challenge for token retrying in 14.096826394s","time":"2019-05-07T08:34:24Z"}
traefik-pcxcv traefik {"level":"error","msg":"Error getting challenge for token retrying in 20.817540857s","time":"2019-05-07T08:34:38Z"}
traefik-pcxcv traefik {"level":"error","msg":"Error getting challenge for token: cannot find challenge for token Z0PAuDqxQL7_Ug27H5x_glyYnaWQ8uUs3gfVB7KP25U","time":"2019-05-07T08:34:59Z"}

This is super frustrating. I have way too many domains I need to configure. Is there anything I can do to help figure out why it keeps going wrong so often?

Hello,

This bug is linked to a feature that no longer exists in the supported versions of Traefik.

Closing accordingly.

Please feel free to re-open it if necessary.