Athens should not check the status of all etcd endpoints on startup
uhthomas opened this issue · 0 comments
Describe the bug
etcd and the client are designed to be highly available and resilient, manually checking the status of each endpoint defeats these goals as Athens will crash if only 2/3 endpoints of an etcd cluster are available. This behaviour makes rolling updates much harder.
Error Message
N/A - Athens will print an error that it cannot connect to a member of the etcd cluster.
To Reproduce
Steps to reproduce the behavior:
- Have multiple athens instance, and an etcd cluster.
- Restart both at the same time.
- Observe that Athens will not connect to etcd unless all endpoints are available.
Expected behavior
Athens should connect to the etcd cluster and defer connection management to the etcd client. It should automatically load balance and route to available members.
Environment (please complete the following information):
- OS: Linux
- Go version: N/A
- Proxy version: e248d22
- Storage: etcd and s3
Additional context
We run 5 Athens pods and 3 etcd pods in Kubernetes with high availability. We will update both images at the same time, and the Athens deployment will take many minutes to progress as it will crash loop until the etcd cluster is completely ready.