Idle Timeout for Elastic Load Balancers should be configurable
hobti01 opened this issue · 8 comments
When using Helm/Tiller, deployments make take longer than the 60 second default timeout when communicating with the API Server. While the precise issue is related to the API Server, it is logical to allow configuration of all ELBs.
Looks like this also caused calico (confd) issues. If cluster was scaled new workers can not properly join BGP peers, because confd missed etcd events and did not reconfigure bird. We don't see this issue in on-prem guest clusters, because load-balancer there don't have this kind of timeout or has really long one (e.g. multiple hours).
etcd uses other load-balancer, not one with ingress, so I'll create other issue.
The aws-operator change is deployed. We still need to set the timeout values in the cluster custom object. I'd propose
- API - 300 seconds
- Etcd - 3600 seconds
- Ingress - 300 seconds
Setting etcd to 3600 secs will resolve #464 raised by Roman.
@r7vme @calvix Are you OK with these values?
@hobti01 Is 300 secs high enough for apiserver to resolve your Helm problems?
Etcd 3600 is OK, but i'm not sure about others.
AWS idle timeout is last resort for dropping stuck connections (you have also kernel TCP stuff, application level logic.). From one side having short idle timeout it can save us from some attacks. From the other side API has a lot of functionality that uses long-living connections (e.g. watches, logs, execs ).
I've checked google for kind of "best practices" for k8s api. Only found that DEIS recommend to use 1200 sec. So from my side i think it also makes sense to start with 1200sec for API and Ingress.
@r7vme Thanks, OK let's go with 1200 for api and ingress. I'll update kubernetesd to set these values.