Impossible to cluster when having readiness gates on port 8080
c-datculescu opened this issue · 3 comments
Describe the bug
When using clustering in combination with Readiness Gates (AWS ALB readiness gates), it is impossible to start the pods, because no endpoint will become available until the endpoints have been populated, but the endpoints will never be populated until the readiness gate passes. This ends up in a loop which never allows a pod to be fully started.
To Reproduce
Steps to reproduce the behavior:
- Use EKS
- Deploy kube-httpcache, 2 pods minimum
- Look at the logs from kube httpcache, an error message like the following one appears:
W0308 14:41:30.853956 1 endpoints_watch.go:66] service 'some_random_service' has no endpoints
Expected behavior
I would expect to be able to cluster the pods.
Environment:
- Kubernetes version: [e.g. 1.26]
- kube-httpcache version: [e.g. v0.7]
Configuration
Additional context
I'm having the same issue. It's a catch-22.
I solved it with a custom readiness check script which always returns positive on the first check then only reports positive if the cached site available on 127.0.0.1:8080. But it's a dirty hack of course.
What is the exact functionality for the frontend watch? What happens when I turn it off? It is related to distributing signals eg PURGE?
I have the same issue. It helps to have a Service
with .spec.publishNotReadyAddresses=true
but then another problem will appear.
When PODs are added (by scaling Deployment
or Statefulset
up) there is a race condition in pkg/watcher/endpoints_watch.go:89.
PODs are added to the Service but they are not necessarily in ready status for all conditions and the check on that line will discard this POD address from the list. After receiving the next event (after scaling up again for example) this skipped POD will be included (assuming it is ready now) but the next one will experience the same race condition and probably will be missed as well.
I would suggest adding a command line options to disable this check and always include all frontend/backend endpoints (depending on the cli options):
--no-frontend-condition-check
--no-backend-condition-check
I could prepare a PullRequest with those CLI options if this solution is acceptable.