kz8s/tack

ElasticSearch statefulset broken after 1 hour

tomfotherby opened this issue · 4 comments

I brought up a Kubernetes cluster with Tack in an existing VPC in us-east-1 and all was good until suddenly the first pod in the ElasticSearch StatefulSet was killed.

I confirmed from CloudTrail and the ASG Activity History that the autoscaler had removed a Worker which, by chance, had a ElasticSearch pod on it. I can see the 25G EBS volume that the StatefulSet volumeClaimTemplates had provisioned, is now unattached.

The second ElasticSearch pod, was assigned to a master node, so is unaffected by scaling events. One solution would be to force both ElasticSearch pods to use master nodes.

Here we see the statefulset is broken:

$ kubectl get statefulset -n kube-system
NAME                    DESIRED   CURRENT   AGE
elasticsearch-logging   2         1         5h

The elasticsearch-logging-1 pod exists but the elasticsearch-logging-0 pod is missing:

$ kubectl get pods -n kube-system -l k8s-app=elasticsearch-logging
NAME                      READY     STATUS    RESTARTS   AGE
elasticsearch-logging-1   1/1       Running   0          5h

This command explains the cause of the failure, i.e it's trying to attach a ebs volume to a now non-existent node:

$ get events -n kube-system
LASTSEEN   FIRSTSEEN   COUNT     NAME                      KIND      SUBOBJECT   TYPE      REASON        SOURCE         MESSAGE
3s         6h          195       elasticsearch-logging-0   Pod                   Warning   FailedMount   attachdetach   Failed to attach volume "pvc-e88bb223-4c34-11e7-bb12-0afa88f15a64" on node "ip-10-56-0-138.ec2.internal" with: error finding instance ip-10-56-0-138.ec2.internal: instance not found

This command shows there is some problem deleting the node (even though it does not show up in kubectl get nodes):

$ kubectl get events
LASTSEEN   FIRSTSEEN   COUNT     NAME                          KIND      SUBOBJECT   TYPE      REASON         SOURCE              MESSAGE
3s         4h          3386      ip-10-56-0-138.ec2.internal   Node                  Normal    DeletingNode   controllermanager   Node ip-10-56-0-138.ec2.internal event: Deleting Node ip-10-56-0-138.ec2.internal because it's not present according to cloud provider

(FYI: I think this log spam is a separate issue fixed in kubernetes/kubernetes#45923)

Checking the autoscaler info also shows there's still 6 registered nodes:

kubectl get configmap cluster-autoscaler-status -n kube-system -o yaml
apiVersion: v1
data:
  status: |+
    Cluster-autoscaler status at 2017-06-08 17:11:20.848329171 +0000 UTC:
    Cluster-wide:
      Health:      Healthy (ready=5 unready=0 notStarted=0 longNotStarted=0 registered=6)
...

I'm not sure how to tell kubernetes to truly forget the old ip-10-56-0-138 worker node or to stop trying to mount the volume to instance that doesn't exist.

My cluster never recovered. There is some deadlock occurring related to EBS persistent volumes and the autoscaler. I ran out of time and energy investigating but had to press on so I did make clean and re-created the cluster after changing addons/logging/elasticsearch-logging.yml to include a nodeSelector to force it to be placed on a master node so the issue can't re-occur:

    spec:
      nodeSelector:
        # Force ES pods to be on master nodes because otherwise the autoscaler may
        # Shutdown the node and the statefulset gets left unable to function
        # due to a bug with EBS attachments or something, not sure exactly.
        node-role.kubernetes.io/master: ''

Feel free to close this issue if you think it's a rare event or statefulset bug.

I'm not 100% sure but I think my problem is fixed in PR kubernetes/kubernetes#46463 :

Fix AWS EBS volumes not getting detached from node if routine to verify volumes are attached runs while the node is down.

Which I found from the Kuberbetes v1.7.0-beta.1 CHANGELOG. So hopefully coming 28/Jun/17.

cemo commented

There was some issues regarding etcd version as well. The current version 3.0.10 is problematic.

Would you add an entry to update your etcd version as well?

        - name: 10-environment.conf
          content: |
            [Service]
            Environment="ETCD_IMAGE_TAG=v3.0.17"
            Environment="ETCD_ADVERTISE_CLIENT_URLS=https://${ fqdn }:2379"
            Environment="ETCD_CERT_FILE=/etc/ssl/certs/k8s-etcd.pem"
            Environment="ETCD_CLIENT_CERT_AUTH=true"
           ....

like this.

bruj0 commented

I too have problems with stateful sets described at #185 but i think thats a Kubernetes problems, regarding the autoscaler i had problems with it too so i just disabled it entirely .
I dont think its ready for production yet.