Athens should not check the status of all etcd endpoints on startup

Question

Athens should not check the status of all etcd endpoints on startup

uhthomas opened this issue a year ago · 0 comments

Describe the bug

etcd and the client are designed to be highly available and resilient, manually checking the status of each endpoint defeats these goals as Athens will crash if only 2/3 endpoints of an etcd cluster are available. This behaviour makes rolling updates much harder.

Error Message

N/A - Athens will print an error that it cannot connect to a member of the etcd cluster.

To Reproduce
Steps to reproduce the behavior:

Have multiple athens instance, and an etcd cluster.
Restart both at the same time.
Observe that Athens will not connect to etcd unless all endpoints are available.

Expected behavior

Athens should connect to the etcd cluster and defer connection management to the etcd client. It should automatically load balance and route to available members.

Environment (please complete the following information):

OS: Linux
Go version: N/A
Proxy version: e248d22
Storage: etcd and s3

Additional context

We run 5 Athens pods and 3 etcd pods in Kubernetes with high availability. We will update both images at the same time, and the Athens deployment will take many minutes to progress as it will crash loop until the etcd cluster is completely ready.