ElasticSearch statefulset broken after 1 hour

Question

ElasticSearch statefulset broken after 1 hour

tomfotherby opened this issue 8 years ago · 4 comments

I brought up a Kubernetes cluster with Tack in an existing VPC in us-east-1 and all was good until suddenly the first pod in the ElasticSearch StatefulSet was killed.

I confirmed from CloudTrail and the ASG Activity History that the autoscaler had removed a Worker which, by chance, had a ElasticSearch pod on it. I can see the 25G EBS volume that the StatefulSet volumeClaimTemplates had provisioned, is now unattached.

The second ElasticSearch pod, was assigned to a master node, so is unaffected by scaling events. One solution would be to force both ElasticSearch pods to use master nodes.

Here we see the statefulset is broken:

$ kubectl get statefulset -n kube-system
NAME                    DESIRED   CURRENT   AGE
elasticsearch-logging   2         1         5h

The elasticsearch-logging-1 pod exists but the elasticsearch-logging-0 pod is missing:

$ kubectl get pods -n kube-system -l k8s-app=elasticsearch-logging
NAME                      READY     STATUS    RESTARTS   AGE
elasticsearch-logging-1   1/1       Running   0          5h

This command explains the cause of the failure, i.e it's trying to attach a ebs volume to a now non-existent node:

$ get events -n kube-system
LASTSEEN   FIRSTSEEN   COUNT     NAME                      KIND      SUBOBJECT   TYPE      REASON        SOURCE         MESSAGE
3s         6h          195       elasticsearch-logging-0   Pod                   Warning   FailedMount   attachdetach   Failed to attach volume "pvc-e88bb223-4c34-11e7-bb12-0afa88f15a64" on node "ip-10-56-0-138.ec2.internal" with: error finding instance ip-10-56-0-138.ec2.internal: instance not found

This command shows there is some problem deleting the node (even though it does not show up in kubectl get nodes):

$ kubectl get events
LASTSEEN   FIRSTSEEN   COUNT     NAME                          KIND      SUBOBJECT   TYPE      REASON         SOURCE              MESSAGE
3s         4h          3386      ip-10-56-0-138.ec2.internal   Node                  Normal    DeletingNode   controllermanager   Node ip-10-56-0-138.ec2.internal event: Deleting Node ip-10-56-0-138.ec2.internal because it's not present according to cloud provider

(FYI: I think this log spam is a separate issue fixed in kubernetes/kubernetes#45923)

Checking the autoscaler info also shows there's still 6 registered nodes:

kubectl get configmap cluster-autoscaler-status -n kube-system -o yaml
apiVersion: v1
data:
  status: |+
    Cluster-autoscaler status at 2017-06-08 17:11:20.848329171 +0000 UTC:
    Cluster-wide:
      Health:      Healthy (ready=5 unready=0 notStarted=0 longNotStarted=0 registered=6)
...

I'm not sure how to tell kubernetes to truly forget the old ip-10-56-0-138 worker node or to stop trying to mount the volume to instance that doesn't exist.

Answer 1 · 2017-06-08T20:17:50.000Z

My cluster never recovered. There is some deadlock occurring related to EBS persistent volumes and the autoscaler. I ran out of time and energy investigating but had to press on so I did make clean and re-created the cluster after changing addons/logging/elasticsearch-logging.yml to include a nodeSelector to force it to be placed on a master node so the issue can't re-occur:

    spec:
      nodeSelector:
        # Force ES pods to be on master nodes because otherwise the autoscaler may
        # Shutdown the node and the statefulset gets left unable to function
        # due to a bug with EBS attachments or something, not sure exactly.
        node-role.kubernetes.io/master: ''

Feel free to close this issue if you think it's a rare event or statefulset bug.

Answer 2 · 2017-06-09T09:07:14.000Z

I'm not 100% sure but I think my problem is fixed in PR kubernetes/kubernetes#46463 :

Fix AWS EBS volumes not getting detached from node if routine to verify volumes are attached runs while the node is down.

Which I found from the Kuberbetes v1.7.0-beta.1 CHANGELOG. So hopefully coming 28/Jun/17.

Answer 3 · 2017-06-09T20:50:16.000Z

There was some issues regarding etcd version as well. The current version 3.0.10 is problematic.

Would you add an entry to update your etcd version as well?

        - name: 10-environment.conf
          content: |
            [Service]
            Environment="ETCD_IMAGE_TAG=v3.0.17"
            Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${ fqdn }:2379"
            Environment="ETCD_CERT_FILE=/etc/ssl/certs/k8s-etcd.pem"
            Environment="ETCD_CLIENT_CERT_AUTH=true"
           ....

like this.

Answer 4 · 2017-06-09T20:55:45.000Z

I too have problems with stateful sets described at #185 but i think thats a Kubernetes problems, regarding the autoscaler i had problems with it too so i just disabled it entirely .
I dont think its ready for production yet.