digitalocean/digitalocean-cloud-controller-manager

Confusion with do-loadbalancer-hostname

charlesg99 opened this issue · 4 comments

I'm not sure if I read enough on the subject, but I still don't get how "service.beta.kubernetes.io/do-loadbalancer-hostname" is used by the ingress controller. I guess having a domain instead of an IP forces a dns request that exits the cluster? I just fixed a cert-manager problem with this annotation and I also don't get why this happenned, my other domains on the same load balancer/cluster didn't have this "pod-pod" network issue when creating ssl certificates.

My real issue is that I don't know if having set this annotation will prevent my other domains from correctly renewing their ssl certificates. Can you clarify this and mention this use case in the documentation?

Hey @charlesg99 👋

Technically speaking, the annotation really only serves a single need, which is to return a hostname from the LB status (the related code is fairly straight forward) that will later be injected into the LoadBalancer-typed Service object. This, in turn, causes Kubernetes to not do hair-pinning and instead route via the external LB IP address.

I don't immediately see how the annotation / the related Kubernetes limitation could be related to your cert-manager problem: unless your setup is somehow specific / unusual, cert-manager should just talk to the API server and possibly public endpoints (e.g., to get certificates renewed). Neither should require routing through pods via a managed LB. I'm wondering if you adding the annotation had some kind of side effect that addressed your specific issue, but wasn't directly tied to the technical functionality in CCM described above.
If you still have data from when cert-manager failed for you (e.g., logs, error messages, events) that could be helpful in doing root cause analysis. Otherwise, you could try to force a certificate renewal on a test setup and troubleshoot based on that.

Thanks for the answer, all I know if that when it failed, the http01 acme challenge was accessible from outside the cluster but certmanager failed to resolve it. Same issue as this.

I have many domains that all use the same automated deployment (same ingress resource for all of them, the only change is the domain that's being changed by helm values) and never had this certificate issue before. After all the basic checks (dns and such), I ended up reading that it seems like a common issue with an external loadbalancer in front of the cluster's ingress.

cert-manager/cert-manager#3238 (comment)
kubernetes/kubernetes#66607 (comment)
digitalocean/Kubernetes-Starter-Kit-Developers#205 (comment)

As I said, I don't mind adding the annotation with one of the domains that resolves to the loadbalancer, but I just want to make sure that if I set "mydomain.com", it won't prevent the certificate renewal of "myotherdomain.com" down the line.

Since I added the annotation with domain "X" yesterday, I installed a different domain "Y" and its certificate generated correctly so it doesn't seem to affect it 🤞. Would be nice to have a confirmation though and I would have liked to get it from reading the documentation :)

(Apologies, I thought I had responded to this one some time ago but apparently I hadn't 🤦 )

AFAIU, the comments from the first two linked issues seem more related to the routing problem that the hostname annotation is supposed to address.

The third one does go more into the problem you're facing. I'm no expert in Let's Encrypt and the http01 challenge in particular, but what I could image happening is that cert-manager is actually executing the self-check. If that's the case and the domain name is pointing at the DO LB, then the requests would be bypassing the LB and thereby possibly breaking the LE / validation flow? That'd explain why the hostname annotation fixes the issue as it forces requests to leave the cluster.

I don't think it'd affect any other domain you might have given that the annotation should only impact the routing path from a pod towards the LB IP address. If anything, I'd argue that setting it should lead to a more "natural" behavior for most use cases as requests take a full roundtrip.

Let me know if that makes sense to you.