ElasticSearch statefulset broken after 1 hour
tomfotherby opened this issue · 4 comments
I brought up a Kubernetes cluster with Tack in an existing VPC in us-east-1 and all was good until suddenly the first pod in the ElasticSearch StatefulSet
was killed.
I confirmed from CloudTrail
and the ASG Activity History
that the autoscaler had removed a Worker which, by chance, had a ElasticSearch pod on it. I can see the 25G EBS volume that the StatefulSet volumeClaimTemplates
had provisioned, is now unattached.
The second ElasticSearch pod, was assigned to a master node, so is unaffected by scaling events. One solution would be to force both ElasticSearch pods to use master nodes.
Here we see the statefulset is broken:
$ kubectl get statefulset -n kube-system
NAME DESIRED CURRENT AGE
elasticsearch-logging 2 1 5h
The elasticsearch-logging-1
pod exists but the elasticsearch-logging-0
pod is missing:
$ kubectl get pods -n kube-system -l k8s-app=elasticsearch-logging
NAME READY STATUS RESTARTS AGE
elasticsearch-logging-1 1/1 Running 0 5h
This command explains the cause of the failure, i.e it's trying to attach a ebs volume to a now non-existent node:
$ get events -n kube-system
LASTSEEN FIRSTSEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE
3s 6h 195 elasticsearch-logging-0 Pod Warning FailedMount attachdetach Failed to attach volume "pvc-e88bb223-4c34-11e7-bb12-0afa88f15a64" on node "ip-10-56-0-138.ec2.internal" with: error finding instance ip-10-56-0-138.ec2.internal: instance not found
This command shows there is some problem deleting the node (even though it does not show up in kubectl get nodes
):
$ kubectl get events
LASTSEEN FIRSTSEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE
3s 4h 3386 ip-10-56-0-138.ec2.internal Node Normal DeletingNode controllermanager Node ip-10-56-0-138.ec2.internal event: Deleting Node ip-10-56-0-138.ec2.internal because it's not present according to cloud provider
(FYI: I think this log spam is a separate issue fixed in kubernetes/kubernetes#45923)
Checking the autoscaler info also shows there's still 6 registered nodes:
kubectl get configmap cluster-autoscaler-status -n kube-system -o yaml
apiVersion: v1
data:
status: |+
Cluster-autoscaler status at 2017-06-08 17:11:20.848329171 +0000 UTC:
Cluster-wide:
Health: Healthy (ready=5 unready=0 notStarted=0 longNotStarted=0 registered=6)
...
I'm not sure how to tell kubernetes to truly forget the old ip-10-56-0-138 worker node or to stop trying to mount the volume to instance that doesn't exist.
My cluster never recovered. There is some deadlock occurring related to EBS persistent volumes and the autoscaler. I ran out of time and energy investigating but had to press on so I did make clean
and re-created the cluster after changing addons/logging/elasticsearch-logging.yml
to include a nodeSelector
to force it to be placed on a master node so the issue can't re-occur:
spec:
nodeSelector:
# Force ES pods to be on master nodes because otherwise the autoscaler may
# Shutdown the node and the statefulset gets left unable to function
# due to a bug with EBS attachments or something, not sure exactly.
node-role.kubernetes.io/master: ''
Feel free to close this issue if you think it's a rare event or statefulset bug.
I'm not 100% sure but I think my problem is fixed in PR kubernetes/kubernetes#46463 :
Fix AWS EBS volumes not getting detached from node if routine to verify volumes are attached runs while the node is down.
Which I found from the Kuberbetes v1.7.0-beta.1 CHANGELOG. So hopefully coming 28/Jun/17.
There was some issues regarding etcd version as well. The current version 3.0.10 is problematic.
Would you add an entry to update your etcd version as well?
- name: 10-environment.conf
content: |
[Service]
Environment="ETCD_IMAGE_TAG=v3.0.17"
Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${ fqdn }:2379"
Environment="ETCD_CERT_FILE=/etc/ssl/certs/k8s-etcd.pem"
Environment="ETCD_CLIENT_CERT_AUTH=true"
....
like this.