kubernetes-retired/external-storage

[efs-provisioner] EKS - timeout on mount

mlaythe opened this issue · 6 comments

Hi,

I've been running efs provisioner on my cluster for a few days and it suddenly stopped working. It started to throw this error:

Unable to mount volumes for pod "<pod_name>)": timeout expired waiting for volumes to attach or mount for pod "prod"/"<pod_name>". list of unmounted volumes=[efs-pvc]. list of unattached volumes=[efs-pvc default-token-wxwxc]

I'm sharing an EFS drive between two namespaces, currently.

apiVersion: v1
kind: PersistentVolume
metadata:
  name: demo-pv
  annotations:
    pv.kubernetes.io/provisioned-by: "aws-efs"
spec:
  capacity:
    storage: 1Mi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Delete
  storageClassName: aws-efs
  mountOptions:
    - hard
    - nfsvers=4.1
  nfs:
    path: /efs-pvc-<claim_id>
    server: <fs_id>.efs.us-west-2.amazonaws.com

---

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: demo-pv-claim
  namespace: demo
  annotations:
    volume.beta.kubernetes.io/storage-class: "aws-efs"
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Mi

---

apiVersion: v1
kind: PersistentVolume
metadata:
  name: prod-pv
  annotations:
    pv.kubernetes.io/provisioned-by: "aws-efs"
spec:
  capacity:
    storage: 1Mi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Delete
  storageClassName: aws-efs
  mountOptions:
    - hard
    - nfsvers=4.1
  nfs:
    path: /efs-pvc-<claim_id>
    server: <fs_id>.efs.us-west-2.amazonaws.com

---

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: prod-pv-claim
  namespace: prod
  annotations:
    volume.beta.kubernetes.io/storage-class: "aws-efs"
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Mi

I'm able to manually mount it on a separate instance and cd into the path I specified. I'm unsure on how to check the kubelet logs for EKS. Any help would be greatly appreciated, thanks!

It seems it can't mount the EFS drive if it's deployed to certain nodes, while it'll work on other nodes. What could cause this locked state on a node? Looks like getting the kubelet logs would require ssh access which I didn't configure initially, and redeploying the node group is out of the question, unfortunately.

Yeah, I added 2049 port access to the security group for my EFS drive. It used to work before, so all permissions were setup correctly prior.

kubelet logs should be handled by journald. journalctl -u kubelet and there should be log messages about the Mount operation failing.

I'm unable to SSH into the worker nodes because I didn't set it up initially on EKS

Sorry, it looks like my EFS instance ran out of credits, hence the weird behavior. Thanks for your help and time!