kubernetes-csi/node-driver-registrar

ECONNREFUSED when deploy csi node plugin with node-driver-registrar

ltson4121994 opened this issue · 1 comments

I am trying to deploy a simple CSI plugin but my node server does not seems to work properly with node-driver-registrar.

This is my .yaml file:

kind: DaemonSet
apiVersion: apps/v1
metadata:
  name: demo-csi-node
spec:
  selector:
    matchLabels:
      app: demo-csi-node
  template:
    metadata:
      labels:
        app: demo-csi-node
    spec:
      serviceAccountName: demo-csi-sa
      containers:
        - name: node-driver-registrar
          image: k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.5.0
          args:
            - --csi-address=/csi/csi.sock
            - --kubelet-registration-path=/var/lib/kubelet/plugins/demo-csi/csi.sock
          volumeMounts:
            - name: socket-dir
              mountPath: /csi
            - name: registration-dir
              mountPath: /registration
        - name: demo-csi-node
          securityContext:
            privileged: true
            capabilities:
              add: ["SYS_ADMIN"]
            allowPrivilegeEscalation: true
          image: ltson1/demo-csi-img
          args:
            - "--endpoint=$(CSI_ENDPOINT)"
          env:
            - name: CSI_ENDPOINT
              value: unix:///csi/csi.sock
          imagePullPolicy: "IfNotPresent"
          volumeMounts:
            - name: socket-dir
              mountPath: /csi
            - name: pods-mount-dir
              mountPath: /var/lib/kubelet/
              mountPropagation: "Bidirectional"
      volumes:
        - name: socket-dir
          hostPath:
            path: /var/lib/kubelet/plugins/demo-csi
            type: DirectoryOrCreate
        - name: pods-mount-dir
          hostPath:
            path: /var/lib/kubelet/
            type: Directory
        - hostPath:
            path: /var/lib/kubelet/plugins_registry
            type: Directory
          name: registration-dir

This is the output when I run strace with node-driver-registrar:

connect(3, {sa_family=AF_UNIX, sun_path="/csi/csi.sock"}, 16) = -1 ECONNREFUSED (Connection refused)
close(3)                                = 0
epoll_pwait(4, [], 128, 375, NULL, 1)   = 0
futex(0xc000050550, FUTEX_WAKE_PRIVATE, 1) = 1
epoll_pwait(4, [{EPOLLIN, {u32=16476080, u64=16476080}}], 128, 4999, NULL, 2) = 1
read(5, "\0", 16)                       = 1
epoll_pwait(4, [], 128, 556, NULL, 87161855399915) = 0
futex(0xc000050550, FUTEX_WAKE_PRIVATE, 1) = 1
socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3
connect(3, {sa_family=AF_UNIX, sun_path="/csi/csi.sock"}, 16) = -1 ECONNREFUSED (Connection refused)
close(3)                                = 0
epoll_pwait(4, [], 128, 1010, NULL, 1)  = 0
futex(0xc000050550, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0xc000050950, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0xc000050950, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0xc000050950, FUTEX_WAKE_PRIVATE, 1) = 1
socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3
connect(3, {sa_family=AF_UNIX, sun_path="/csi/csi.sock"}, 16) = -1 ECONNREFUSED (Connection refused)
close(3)                                = 0
futex(0xc000050950, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0xc000050950, FUTEX_WAKE_PRIVATE, 1) = 1
write(6, "\0", 1)                       = 1
futex(0xf86ed0, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0xf86ed0, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
nanosleep({tv_sec=0, tv_nsec=3000}, NULL) = 0
futex(0xf86ed0, FUTEX_WAIT_PRIVATE, 0, NULL^Cstrace: Process 1647459 detached
 <detached ...>

I do not have much clue on how to debug and have been stuck for quite a while, any help would be much appreciated. Thanks.

Turns out this issue is due to the socket not being released from previous run when the pod is terminate before the socket is closed