piraeusdatastore/helm-charts

snapshot-controller: error while generating certificate with cert manager

jkossis opened this issue · 3 comments

When using the webhook's certManagerIssuerRef configuration, the following error occurs while generating the certificate:

Name:         snapshot-validation-webhook
Namespace:    snapshot-controller
Labels:       app.kubernetes.io/instance=snapshot-controller
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=snapshot-validation-webhook
              app.kubernetes.io/version=v6.3.3
              helm.sh/chart=snapshot-controller-2.0.4
Annotations:  meta.helm.sh/release-name: snapshot-controller
              meta.helm.sh/release-namespace: snapshot-controller
API Version:  cert-manager.io/v1
Kind:         Certificate
Metadata:
  Creation Timestamp:  2024-01-08T16:35:43Z
  Generation:          1
  Resource Version:    2103731
  UID:                 953009bf-dad7-47ae-aac0-678ab7191808
Spec:
  Dns Names:
    snapshot-validation-webhook.snapshot-controller.svc
  Issuer Ref:
    Kind:  ClusterIssuer
    Name:  cloudflare
  Private Key:
    Rotation Policy:  Always
  Secret Name:        snapshot-validation-webhook-tls
Status:
  Conditions:
    Last Transition Time:    2024-01-08T16:35:43Z
    Message:                 Issuing certificate as Secret does not exist
    Observed Generation:     1
    Reason:                  DoesNotExist
    Status:                  False
    Type:                    Ready
    Last Transition Time:    2024-01-08T16:35:44Z
    Message:                 The certificate request has failed to complete and will be retried: Failed to wait for order resource "snapshot-validation-webhook-1-112280392" to become ready: order is in "errored" state: Failed to create Order: 400 urn:ietf:params:acme:error:rejectedIdentifier: Error creating new order :: Cannot issue for "snapshot-validation-webhook.snapshot-controller.svc": Domain name does not end with a valid public suffix (TLD)
    Observed Generation:     1
    Reason:                  Failed
    Status:                  False
    Type:                    Issuing
  Failed Issuance Attempts:  1
  Last Failure Time:         2024-01-08T16:35:44Z
Events:
  Type     Reason     Age   From                                       Message
  ----     ------     ----  ----                                       -------
  Normal   Issuing    35s   cert-manager-certificates-trigger          Issuing certificate as Secret does not exist
  Normal   Generated  35s   cert-manager-certificates-key-manager      Stored new private key in temporary Secret resource "snapshot-validation-webhook-m87kf"
  Normal   Requested  35s   cert-manager-certificates-request-manager  Created new CertificateRequest resource "snapshot-validation-webhook-1"
  Warning  Failed     34s   cert-manager-certificates-issuing          The certificate request has failed to complete and will be retried: Failed to wait for order resource "snapshot-validation-webhook-1-112280392" to become ready: order is in "errored" state: Failed to create Order: 400 urn:ietf:params:acme:error:rejectedIdentifier: Error creating new order :: Cannot issue for "snapshot-validation-webhook.snapshot-controller.svc": Domain name does not end with a valid public suffix (TLD)

It looks like Cert Manager is unhappy with the svc suffix ... I can't see how this has worked previously.

It looks like Cert Manager is unhappy with the svc suffix ... I can't see how this has worked previously.

More specifically, it is unhappy because it was asked to create a "public" certificate using ACME, but we try to create the certificate internally, without a valid top-level domain.

But we don't need a public certificate at all. We only need a certificate to validate the internal service names (hence the .svc suffix). Usually, users create some kind of internal issuer, usually using the SelfSigned or CA type and use that to provision these internal certificates.

I don't really see the reason one would want a public certificate for this internal component. If the snapshot-controller was reachable from outside the cluster, something has already gone very wrong.

@WanzenBug, I appreciate you providing some more info on this. Admittedly, this would be very useful info to include in the portion of the readme that pertains to using cert-manager. Right now, this is all that is there:

A [cert-manager.io](https://cert-manager.io/) issuer able to create a certificate for the webhook service.

To use this method, create an override file like:

webhook:
  tls:
    certManagerIssuerRef:
      name: internal-issuer
      kind: ClusterIssuer

To apply the override, use --values <override-file>.

For sure. Patches welcome 😃