Azure/kubernetes-volume-drivers

blobfuse multiple mounts race condition issue

Opened this issue · 9 comments

What happened:

When creating a pod with writable blob fuse mount the container won't start with errors like Unable to mount volumes for pod... list of unmounted volumes=[MY_WRITE_MOUNTED_BLOB-CONTAINER]...

Here is an example from a describe on the pod. Note this is a complicated pod (we are using this in a Jupyter Hub environment) with multiple mounts, including multiple read only blobfuse mounts. Note scratch-blob is the only writable blob mount.

Events:
  Type     Reason             Age                From                                      Message
  ----     ------             ----               ----                                      -------
  Warning  FailedScheduling   55m (x2 over 56m)  default-scheduler                         pod has unbound immediate PersistentVolumeClaims
  Normal   NotTriggerScaleUp  55m                cluster-autoscaler                        pod didn't trigger scale-up (it wouldn't fit if a new node is added):
  Normal   Scheduled          55m                default-scheduler                         Successfully assigned panzure-dev/jupyter-tam203 to aks-default-71061816-vmss000000
  Warning  FailedMount        51m (x2 over 53m)  kubelet, aks-default-71061816-vmss000000  Unable to mount volumes for pod "jupyter-tam203_panzure-dev(89a09755-dac2-11e9-a307-9ac9506c4c50)": timeout expired waiting for volumes to attach or mount for pod "panzure-dev"/"jupyter-tam203". list of unmounted volumes=[scratch-blob]. list of unattached volumes=[volume-tam203 scratch daskernetes-config mo-uki-radar-comp-3yr scratch-blob aws-earth-nc-files daskkubernetes-token-2g854]
  Warning  FailedMount        49m                kubelet, aks-default-71061816-vmss000000  Unable to mount volumes for pod "jupyter-tam203_panzure-dev(89a09755-dac2-11e9-a307-9ac9506c4c50)": timeout expired waiting for volumes to attach or mount for pod "panzure-dev"/"jupyter-tam203". list of unmounted volumes=[volume-tam203 scratch daskernetes-config mo-uki-radar-comp-3yr scratch-blob aws-earth-nc-files daskkubernetes-token-2g854]. list of unattached volumes=[volume-tam203 scratch daskernetes-config mo-uki-radar-comp-3yr scratch-blob aws-earth-nc-files daskkubernetes-token-2g854]
(/Users/theo/external/miniconda3/standard) 

This container will now take hours to be deleted. It's still alive 55 minutes after running a kubectl delete po. 3 hours later it was gone. I don't know how long it took - most times I tear down the cluster because it's slow but still quicker than waiting. Running a force delete makes it disappear but I don't think it's really deleting it, as for example, you will not be able to delete the namespace. The only solutions I've found is tearing down my cluster (AKS) and building up again from scratch or waiting hours (exact number unknown).

I don't believe this is an issue with the secrets/permissions because as detailed below I can make variants that do work with the same blob containers and secrets.

What you expected to happen:

The pod to start normally and to be deleted promptly when delete command sent.

How to reproduce it:

It's difficult to reproduce as in a simpler pod things work however below is taken from kubectl get po -o yaml for a pod that works and one that doesn't. The only difference is that scratch-blob has changed from readonly:false (fails) to readonly:true (works).

Some ENV vars, etc have been redacted because I was concerned they were sensitive.

This works:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    hub.jupyter.org/username: theusername
  creationTimestamp: "2019-09-19T10:30:44Z"
  labels:
    app: jupyterhub
    chart: jupyterhub-v0.7-1656a02
    component: singleuser-server
    heritage: jupyterhub
    hub.jupyter.org/network-access-hub: "true"
    release: panzure-dev
  name: jupyter-theusername
  namespace: panzure-dev
  resourceVersion: "85783"
  selfLink: /api/v1/namespaces/panzure-dev/pods/jupyter-theusername
  uid: 89a3a078-dac8-11e9-a307-9ac9506c4c50
spec:
  containers:
  - args:
    - jupyter-labhub
    - --ip="0.0.0.0"
    - --port=8888
    - --NotebookApp.default_url="/lab"
    env:
    - name: EMAIL
      value: theusername@local
    - name: GIT_AUTHOR_NAME
      value: theusername
    - name: GIT_COMMITTER_NAME
      value: theusername
    - name: DASK_KUBERNETES__DIAGNOSTICS_LINK
      value: '{JUPYTERHUB_SERVICE_PREFIX}proxy/{port}/status'
    - name: DASK_KUBERNETES__WORKER_NAME
      value: dask-{JUPYTERHUB_USER}-{uuid}
    - name: DASK_KUBERNETES__WORKER_TEMPLATE_PATH
      value: /etc/daskernetes/worker-template.yaml
    - name: EXAMPLES_GIT_URL
      value: https://github.com/informatics-lab/example-notebooks.git
    - name: JUPYTERHUB_ADMIN_ACCESS
      value: "1"
    - name: JUPYTERHUB_CLIENT_ID
      value: jupyterhub-user-theusername
    - name: JUPYTERHUB_HOST
    - name: JUPYTERHUB_OAUTH_CALLBACK_URL
      value: /user/theusername/oauth_callback
    - name: JUPYTERHUB_USER
      value: theusername
    - name: JUPYTERHUB_API_URL
      value: http://blahblah:8081/hub/api
    - name: JUPYTERHUB_BASE_URL
      value: /
    - name: JUPYTERHUB_SERVICE_PREFIX
      value: /user/theusername/
    - name: MEM_LIMIT
      value: "17179869184"
    - name: MEM_GUARANTEE
      value: "2147483648"
    - name: CPU_LIMIT
      value: "2.0"
    - name: CPU_GUARANTEE
      value: "1.0"
    - name: JUPYTER_IMAGE_SPEC
      value: informaticslab/pangeo-notebook:0.8.1
    image: informaticslab/pangeo-notebook:0.8.1
    imagePullPolicy: IfNotPresent
    lifecycle: {}
    name: notebook
    ports:
    - containerPort: 8888
      name: notebook-port
      protocol: TCP
    resources:
      limits:
        cpu: "2"
        memory: "17179869184"
      requests:
        cpu: "1"
        memory: "2147483648"
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /home/jovyan
      name: volume-theusername
    - mountPath: /scratch
      name: scratch
    - mountPath: /etc/daskernetes
      name: daskernetes-config
      readOnly: true
    - mountPath: /mo-uki-radar-comp-3yr
      name: mo-uki-radar-comp-3yr
      readOnly: true
    - mountPath: /scratch-blob
      name: scratch-blob
      readOnly: true
    - mountPath: /aws-earth-nc-files
      name: aws-earth-nc-files
      readOnly: true
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: daskkubernetes-token-2g854
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  imagePullSecrets:
  - name: singleuser-image-credentials
  initContainers:
  - command:
    - iptables
    - -A
    - OUTPUT
    - -d
    - 169.254.169.254
    - -j
    - DROP
    image: jupyterhub/k8s-network-tools:c7f70f9
    imagePullPolicy: IfNotPresent
    name: block-cloud-metadata
    resources: {}
    securityContext:
      capabilities:
        add:
        - NET_ADMIN
      privileged: true
      runAsUser: 0
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: daskkubernetes-token-2g854
      readOnly: true
  nodeName: aks-default-71061816-vmss000000
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 100
    runAsGroup: 0
    runAsUser: 1000
  serviceAccount: daskkubernetes
  serviceAccountName: daskkubernetes
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: volume-theusername
    persistentVolumeClaim:
      claimName: claim-theusername
  - name: scratch
    persistentVolumeClaim:
      claimName: scratch
  - configMap:
      defaultMode: 420
      name: daskernetes-config
    name: daskernetes-config
  - flexVolume:
      driver: azure/blobfuse
      options:
        container: mo-uki-radar-comp-3yr
        mountoptions: --file-cache-timeout-in-seconds=600
        tmppath: /tmp/mo-uki-radar-comp-3yr
      readOnly: true
      secretRef:
        name: blobfusecreds
    name: mo-uki-radar-comp-3yr
  - flexVolume:
      driver: azure/blobfuse
      options:
        container: scratch
        mountoptions: --file-cache-timeout-in-seconds=600
        tmppath: /tmp/scratch-blob
      readOnly: true
      secretRef:
        name: blobfusecreds
    name: scratch-blob
  - flexVolume:
      driver: azure/blobfuse
      options:
        container: aws-earth-nc-files
        mountoptions: --file-cache-timeout-in-seconds=600
        tmppath: /tmp/aws-earth-nc-files
      readOnly: true
      secretRef:
        name: earthblobfusecreds
    name: aws-earth-nc-files
  - name: daskkubernetes-token-2g854
    secret:
      defaultMode: 420
      secretName: daskkubernetes-token-2g854
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2019-09-19T10:30:50Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2019-09-19T10:30:51Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2019-09-19T10:30:51Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2019-09-19T10:30:46Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://blahblahblahblahblahblahblah
    image: informaticslab/pangeo-notebook:0.8.1
    imageID: docker-pullable://informaticslab/pangeo-notebook@sha256:blahblahblahblah
    lastState: {}
    name: notebook
    ready: true
    restartCount: 0
    state:
      running:
        startedAt: "2019-09-19T10:30:50Z"
  hostIP: 10.240.0.4
  initContainerStatuses:
  - containerID: docker://blahblahblahblahblah
    image: jupyterhub/k8s-network-tools:c7f70f9
    imageID: docker-pullable://jupyterhub/k8s-network-tools@sha256:blahblahblahblah
    lastState: {}
    name: block-cloud-metadata
    ready: true
    restartCount: 0
    state:
      terminated:
        containerID: docker://blahblahblahblah
        exitCode: 0
        finishedAt: "2019-09-19T10:30:49Z"
        reason: Completed
        startedAt: "2019-09-19T10:30:49Z"
  phase: Running
  podIP: 10.244.0.22
  qosClass: Burstable
  startTime: "2019-09-19T10:30:46Z"

This doesn't:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    hub.jupyter.org/username: theusername
  creationTimestamp: "2019-09-19T09:47:47Z"
  deletionGracePeriodSeconds: 1
  deletionTimestamp: "2019-09-19T09:52:46Z"
  labels:
    app: jupyterhub
    chart: jupyterhub-v0.7-1656a02
    component: singleuser-server
    heritage: jupyterhub
    hub.jupyter.org/network-access-hub: "true"
    release: panzure-dev
  name: jupyter-theusername
  namespace: panzure-dev
  resourceVersion: "81021"
  selfLink: /api/v1/namespaces/panzure-dev/pods/jupyter-theusername
  uid: blahblah
spec:
  containers:
  - args:
    - jupyter-labhub
    - --ip="0.0.0.0"
    - --port=8888
    - --NotebookApp.default_url="/lab"
    env:
    - name: EMAIL
      value: theusername@local
    - name: GIT_AUTHOR_NAME
      value: theusername
    - name: GIT_COMMITTER_NAME
      value: theusername
    - name: DASK_KUBERNETES__DIAGNOSTICS_LINK
      value: '{JUPYTERHUB_SERVICE_PREFIX}proxy/{port}/status'
    - name: DASK_KUBERNETES__WORKER_NAME
      value: dask-{JUPYTERHUB_USER}-{uuid}
    - name: DASK_KUBERNETES__WORKER_TEMPLATE_PATH
      value: /etc/daskernetes/worker-template.yaml
    - name: EXAMPLES_GIT_URL
      value: https://github.com/informatics-lab/example-notebooks.git
    - name: JUPYTERHUB_ADMIN_ACCESS
      value: "1"
    - name: JUPYTERHUB_CLIENT_ID
      value: jupyterhub-user-theusername
    - name: JUPYTERHUB_HOST
    - name: JUPYTERHUB_OAUTH_CALLBACK_URL
      value: /user/theusername/oauth_callback
    - name: JUPYTERHUB_USER
      value: theusername
    - name: JUPYTERHUB_API_URL
      value: http://blahblah:8081/hub/api
    - name: JUPYTERHUB_BASE_URL
      value: /
    - name: JUPYTERHUB_SERVICE_PREFIX
      value: /user/theusername/
    - name: MEM_LIMIT
      value: "17179869184"
    - name: MEM_GUARANTEE
      value: "2147483648"
    - name: CPU_LIMIT
      value: "2.0"
    - name: CPU_GUARANTEE
      value: "1.0"
    - name: JUPYTER_IMAGE_SPEC
      value: informaticslab/pangeo-notebook:0.8.1
    image: informaticslab/pangeo-notebook:0.8.1
    imagePullPolicy: IfNotPresent
    lifecycle: {}
    name: notebook
    ports:
    - containerPort: 8888
      name: notebook-port
      protocol: TCP
    resources:
      limits:
        cpu: "2"
        memory: "17179869184"
      requests:
        cpu: "1"
        memory: "2147483648"
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /home/jovyan
      name: volume-theusername
    - mountPath: /scratch
      name: scratch
    - mountPath: /etc/daskernetes
      name: daskernetes-config
      readOnly: true
    - mountPath: /mo-uki-radar-comp-3yr
      name: mo-uki-radar-comp-3yr
      readOnly: true
    - mountPath: /scratch-blob
      name: scratch-blob
    - mountPath: /aws-earth-nc-files
      name: aws-earth-nc-files
      readOnly: true
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: daskkubernetes-token-2g854
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  imagePullSecrets:
  - name: singleuser-image-credentials
  initContainers:
  - command:
    - iptables
    - -A
    - OUTPUT
    - -d
    - 169.254.169.254
    - -j
    - DROP
    image: jupyterhub/k8s-network-tools:c7f70f9
    imagePullPolicy: IfNotPresent
    name: block-cloud-metadata
    resources: {}
    securityContext:
      capabilities:
        add:
        - NET_ADMIN
      privileged: true
      runAsUser: 0
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: daskkubernetes-token-2g854
      readOnly: true
  nodeName: aks-default-71061816-vmss000000
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 100
    runAsGroup: 0
    runAsUser: 1000
  serviceAccount: daskkubernetes
  serviceAccountName: daskkubernetes
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: volume-theusername
    persistentVolumeClaim:
      claimName: claim-theusername
  - name: scratch
    persistentVolumeClaim:
      claimName: scratch
  - configMap:
      defaultMode: 420
      name: daskernetes-config
    name: daskernetes-config
  - flexVolume:
      driver: azure/blobfuse
      options:
        container: mo-uki-radar-comp-3yr
        mountoptions: --file-cache-timeout-in-seconds=600
        tmppath: /tmp/mo-uki-radar-comp-3yr
      readOnly: true
      secretRef:
        name: blobfusecreds
    name: mo-uki-radar-comp-3yr
  - flexVolume:
      driver: azure/blobfuse
      options:
        container: scratch
        mountoptions: --file-cache-timeout-in-seconds=600
        tmppath: /tmp/scratch-blob
      secretRef:
        name: blobfusecreds
    name: scratch-blob
  - flexVolume:
      driver: azure/blobfuse
      options:
        container: aws-earth-nc-files
        mountoptions: --file-cache-timeout-in-seconds=600
        tmppath: /tmp/aws-earth-nc-files
      readOnly: true
      secretRef:
        name: earthblobfusecreds
    name: aws-earth-nc-files
  - name: daskkubernetes-token-2g854
    secret:
      defaultMode: 420
      secretName: daskkubernetes-token-2g854
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2019-09-19T09:47:57Z"
    message: 'containers with incomplete status: [block-cloud-metadata]'
    reason: ContainersNotInitialized
    status: "False"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2019-09-19T09:47:57Z"
    message: 'containers with unready status: [notebook]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2019-09-19T09:47:57Z"
    message: 'containers with unready status: [notebook]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2019-09-19T09:47:57Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - image: informaticslab/pangeo-notebook:0.8.1
    imageID: ""
    lastState: {}
    name: notebook
    ready: false
    restartCount: 0
    state:
      terminated:
        exitCode: 0
        finishedAt: null
        startedAt: null
  hostIP: 10.240.0.4
  initContainerStatuses:
  - image: jupyterhub/k8s-network-tools:c7f70f9
    imageID: ""
    lastState: {}
    name: block-cloud-metadata
    ready: false
    restartCount: 0
    state:
      terminated:
        exitCode: 0
        finishedAt: null
        startedAt: null
  phase: Pending
  qosClass: Burstable
  startTime: "2019-09-19T09:47:57Z"

These pods are spawned from JupyterHub so they are a little funky. The only difference in the configureation in JupyterHub between these two is readOnly: true on the scratch-blob under extraVolumes and extraVolumeMounts.

Interestingly (an surprisingly) this pod below does work (it's basically the same as the one above that doesn't work but without the init container). It also runs sleep 600 rather than Jupyter but I don' think that's relevant as the failing pod doesn't get to start anyway.

apiVersion: v1
kind: Pod
metadata:
  annotations:
    hub.jupyter.org/username: theusername
  labels:
    app: jupyterhub
  name: works
  namespace: panzure-dev
  
spec:
  containers:
  - args:
    - sleep
    - '600'

    image: informaticslab/pangeo-notebook:0.8.1
    imagePullPolicy: IfNotPresent
    
    name: notebook
    ports:
    - containerPort: 8888
      name: notebook-port
      protocol: TCP
    resources:
      limits:
        cpu: "2"
        memory: "17179869184"
      requests:
        cpu: "1"
        memory: "2147483648"
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /home/jovyan
      name: volume-theusername
    - mountPath: /scratch
      name: scratch
    - mountPath: /etc/daskernetes
      name: daskernetes-config
      readOnly: true
    - mountPath: /mo-uki-radar-comp-3yr
      name: mo-uki-radar-comp-3yr
      readOnly: true
    - mountPath: /scratch-blob
      name: scratch-blob
    - mountPath: /aws-earth-nc-files
      name: aws-earth-nc-files
      readOnly: true
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: daskkubernetes-token-2g854
      readOnly: true

  
  
  volumes:
  - name: volume-theusername
    persistentVolumeClaim:
      claimName: claim-theusername
  - name: scratch
    persistentVolumeClaim:
      claimName: scratch
  - configMap:
      defaultMode: 420
      name: daskernetes-config
    name: daskernetes-config
  - flexVolume:
      driver: azure/blobfuse
      options:
        container: mo-uki-radar-comp-3yr
        mountoptions: --file-cache-timeout-in-seconds=600
        tmppath: /tmp/mo-uki-radar-comp-3yr
      readOnly: true
      secretRef:
        name: blobfusecreds
    name: mo-uki-radar-comp-3yr
  - flexVolume:
      driver: azure/blobfuse
      options:
        container: scratch
        mountoptions: --file-cache-timeout-in-seconds=600
        tmppath: /tmp/scratch-blob
      secretRef:
        name: blobfusecreds
    name: scratch-blob
  - flexVolume:
      driver: azure/blobfuse
      options:
        container: aws-earth-nc-files
        mountoptions: --file-cache-timeout-in-seconds=600
        tmppath: /tmp/aws-earth-nc-files
      readOnly: true
      secretRef:
        name: earthblobfusecreds
    name: aws-earth-nc-files
  - name: daskkubernetes-token-2g854
    secret:
      defaultMode: 420
      secretName: daskkubernetes-token-2g854

Anything else we need to know?:

I realise this is a complicated issue and there isn't much to go on but I feel that there should be more log information somewhere but I don't know where to find it.

I have destroyed the cluster multiple times (and resource group) and this issue is repeatable every time.

Environment:

  • Kubernetes version (use kubectl version):
    Runing on AKS
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.6", GitCommit:"96fac5cd13a5dc064f7d9f4f23030a6aeface6cc", GitTreeState:"clean", BuildDate:"2019-08-19T11:05:16Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}```

- OS (e.g. from /etc/os-release):
Running Azure AKS - Linux - unable to access host machine. 
From in pod/container:
```NAME="Ubuntu"
VERSION="18.04.3 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.3 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
  • Kernel (e.g. uname -a):
    From in container (as mentioned above can not access host):
    Linux works 4.15.0-1055-azure #60-Ubuntu SMP Thu Aug 8 18:29:07 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

  • Install tools:
    AKS / AZ Cli
    Helm
    Zero2Jupyter helm chart
    Pangeo helm chat

  • Others:

There should be azure-cni-networkmonitor-xxx pod under kube-system namespace, could you copy out /var/log/blobfuse-driver.log from aks-default-71061816-vmss000000 node:

kubectl cp kube-system/blobfuse-flexvol-installer-lk6dr:/var/log/blobfuse-driver.log .

and there could be race condition issue since you are using different azure storage accounts, could you remove this and try again:

  - flexVolume:
      driver: azure/blobfuse
      options:
        container: aws-earth-nc-files
        mountoptions: --file-cache-timeout-in-seconds=600
        tmppath: /tmp/aws-earth-nc-files
      readOnly: true
      secretRef:
        name: earthblobfusecreds

Thanks @andyzhangx but I can't find that pod.

$ kubectl get pods -n kube-system
NAME                                             READY   STATUS    RESTARTS   AGE
blobfuse-flexvol-installer-lk6dr                 1/1     Running   0          5h22m
coredns-69b5b66fd8-9qtbt                         1/1     Running   0          22h
coredns-69b5b66fd8-fd7fb                         1/1     Running   0          22h
coredns-autoscaler-65d7986c6b-979k8              1/1     Running   0          22h
external-dns-8cd545cfb-n5b2c                     1/1     Running   0          5h22m
kube-proxy-gsvtd                                 1/1     Running   0          74m
kubernetes-dashboard-cc4cc9f58-n77lz             1/1     Running   3          22h
metrics-server-66dbbb67db-7hs4t                  1/1     Running   0          22h
nginx-ingress-controller-5zfgq                   1/1     Running   0          5h22m
nginx-ingress-default-backend-6b8dc9d88f-zksqr   1/1     Running   0          5h22m
omsagent-gm8p9                                   1/1     Running   0          75m
omsagent-rs-6f4b46d595-jk5gm                     1/1     Running   0          75m
tiller-deploy-9bf6fb76d-d5nxx                    1/1     Running   0          5h22m
tunnelfront-65bd6b97d-jhkff                      1/1     Running   0          22h
$ kubectl get pods --all-namespaces | grep azure
$ 

No results for the above.

I'm not able to access the master if that makes a difference:

$ kubectl get nodes
NAME                              STATUS   ROLES   AGE   VERSION
aks-default-71061816-vmss000000   Ready    agent   22h   v1.14.6
kubectl cp kube-system/blobfuse-flexvol-installer-lk6dr:/var/log/blobfuse-driver.log .

pls provide that log and let me check whether it's due to race condition issue.

@andyzhangx - thanks. Looking at the above examples 2019-09-19T09:47:47Z looks like the time I created the failing pod.

ENV Path: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin
Thu Sep 19 09:18:06 UTC 2019 INFO: {"status": "Success", "capabilities": {"attach": false}}
Thu Sep 19 09:47:58 UTC 2019 EXEC: mkdir -p /var/lib/kubelet/pods/89a09755-dac2-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/mo-uki-radar-comp-3yr
Thu Sep 19 09:47:58 UTC 2019 INF: AZURE_STORAGE_ACCESS_KEY is set 
Thu Sep 19 09:47:58 UTC 2019 INF: export storage account - export AZURE_STORAGE_ACCOUNT=panzuredata 
Thu Sep 19 09:47:58 UTC 2019 EXEC: blobfuse /var/lib/kubelet/pods/89a09755-dac2-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/mo-uki-radar-comp-3yr --container-name=mo-uki-radar-comp-3yr --tmp-path=/tmp/mo-uki-radar-comp-3yr -o allow_other -o ro --file-cache-timeout-in-seconds=600
Thu Sep 19 09:47:58 UTC 2019 INFO: {"status": "Success"}
Thu Sep 19 09:47:58 UTC 2019 EXEC: mkdir -p /var/lib/kubelet/pods/89a09755-dac2-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/scratch-blob
Thu Sep 19 09:47:58 UTC 2019 INF: AZURE_STORAGE_ACCESS_KEY is set 
Thu Sep 19 09:47:58 UTC 2019 INF: export storage account - export AZURE_STORAGE_ACCOUNT=panzuredata 
Thu Sep 19 09:47:58 UTC 2019 EXEC: blobfuse /var/lib/kubelet/pods/89a09755-dac2-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/scratch-blob --container-name=scratch --tmp-path=/tmp/scratch-blob -o allow_other  --file-cache-timeout-in-seconds=600
Thu Sep 19 09:47:59 UTC 2019 EXEC: mkdir -p /var/lib/kubelet/pods/89a09755-dac2-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/aws-earth-nc-files
Thu Sep 19 09:47:59 UTC 2019 INF: AZURE_STORAGE_ACCESS_KEY is set 
Thu Sep 19 09:47:59 UTC 2019 INF: export storage account - export AZURE_STORAGE_ACCOUNT=awsearth 
Thu Sep 19 09:47:59 UTC 2019 EXEC: blobfuse /var/lib/kubelet/pods/89a09755-dac2-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/aws-earth-nc-files --container-name=aws-earth-nc-files --tmp-path=/tmp/aws-earth-nc-files -o allow_other -o ro --file-cache-timeout-in-seconds=600
Thu Sep 19 09:47:59 UTC 2019 INFO: {"status": "Success"}
Thu Sep 19 09:47:59 UTC 2019 INFO: {"status": "Success"}
Thu Sep 19 09:52:46 UTC 2019 EXEC: umount /var/lib/kubelet/pods/89a09755-dac2-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/aws-earth-nc-files, devname: blobfuse
blobfuse
Thu Sep 19 09:52:46 UTC 2019 EXEC: umount /var/lib/kubelet/pods/89a09755-dac2-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/mo-uki-radar-comp-3yr, devname: blobfuse
blobfuse
Thu Sep 19 09:52:46 UTC 2019 EXEC: rmdir /var/lib/kubelet/pods/89a09755-dac2-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/mo-uki-radar-comp-3yr
Thu Sep 19 09:52:46 UTC 2019 INFO: {"status": "Success"}
Thu Sep 19 09:52:46 UTC 2019 EXEC: rmdir /var/lib/kubelet/pods/89a09755-dac2-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/aws-earth-nc-files
Thu Sep 19 09:52:46 UTC 2019 INFO: {"status": "Success"}
Thu Sep 19 10:21:51 UTC 2019 EXEC: mkdir -p /var/lib/kubelet/pods/4b9a9926-dac7-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/aws-earth-nc-files
Thu Sep 19 10:21:51 UTC 2019 EXEC: mkdir -p /var/lib/kubelet/pods/4b9a9926-dac7-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/scratch-blob
Thu Sep 19 10:21:51 UTC 2019 EXEC: mkdir -p /var/lib/kubelet/pods/4b9a9926-dac7-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/mo-uki-radar-comp-3yr
Thu Sep 19 10:21:51 UTC 2019 INF: AZURE_STORAGE_ACCESS_KEY is set 
Thu Sep 19 10:21:51 UTC 2019 INF: AZURE_STORAGE_ACCESS_KEY is set 
Thu Sep 19 10:21:51 UTC 2019 INF: AZURE_STORAGE_ACCESS_KEY is set 
Thu Sep 19 10:21:51 UTC 2019 INF: export storage account - export AZURE_STORAGE_ACCOUNT=panzuredata 
Thu Sep 19 10:21:51 UTC 2019 INF: export storage account - export AZURE_STORAGE_ACCOUNT=awsearth 
Thu Sep 19 10:21:51 UTC 2019 INF: export storage account - export AZURE_STORAGE_ACCOUNT=panzuredata 
Thu Sep 19 10:21:51 UTC 2019 EXEC: blobfuse /var/lib/kubelet/pods/4b9a9926-dac7-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/scratch-blob --container-name=scratch --tmp-path=/tmp/scratch-blob -o allow_other  --file-cache-timeout-in-seconds=600
Thu Sep 19 10:21:51 UTC 2019 EXEC: blobfuse /var/lib/kubelet/pods/4b9a9926-dac7-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/aws-earth-nc-files --container-name=aws-earth-nc-files --tmp-path=/tmp/aws-earth-nc-files -o allow_other -o ro --file-cache-timeout-in-seconds=600
Thu Sep 19 10:21:51 UTC 2019 EXEC: blobfuse /var/lib/kubelet/pods/4b9a9926-dac7-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/mo-uki-radar-comp-3yr --container-name=mo-uki-radar-comp-3yr --tmp-path=/tmp/mo-uki-radar-comp-3yr -o allow_other -o ro --file-cache-timeout-in-seconds=600
Thu Sep 19 10:21:51 UTC 2019 INFO: {"status": "Success"}
Thu Sep 19 10:21:52 UTC 2019 INFO: {"status": "Success"}
Thu Sep 19 10:21:52 UTC 2019 INFO: {"status": "Success"}
Thu Sep 19 10:23:25 UTC 2019 EXEC: umount /var/lib/kubelet/pods/4b9a9926-dac7-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/scratch-blob, devname: blobfuse
blobfuse
Thu Sep 19 10:23:25 UTC 2019 EXEC: umount /var/lib/kubelet/pods/4b9a9926-dac7-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/mo-uki-radar-comp-3yr, devname: blobfuse
blobfuse
Thu Sep 19 10:23:25 UTC 2019 EXEC: umount /var/lib/kubelet/pods/4b9a9926-dac7-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/aws-earth-nc-files, devname: blobfuse
blobfuse
Thu Sep 19 10:23:25 UTC 2019 EXEC: rmdir /var/lib/kubelet/pods/4b9a9926-dac7-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/aws-earth-nc-files
Thu Sep 19 10:23:25 UTC 2019 EXEC: rmdir /var/lib/kubelet/pods/4b9a9926-dac7-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/mo-uki-radar-comp-3yr
Thu Sep 19 10:23:25 UTC 2019 EXEC: rmdir /var/lib/kubelet/pods/4b9a9926-dac7-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/scratch-blob
Thu Sep 19 10:23:25 UTC 2019 INFO: {"status": "Success"}
Thu Sep 19 10:23:25 UTC 2019 INFO: {"status": "Success"}
Thu Sep 19 10:23:25 UTC 2019 INFO: {"status": "Success"}
Thu Sep 19 10:23:49 UTC 2019 EXEC: mkdir -p /var/lib/kubelet/pods/918bc333-dac7-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/mo-uki-radar-comp-3yr
Thu Sep 19 10:23:49 UTC 2019 INF: AZURE_STORAGE_ACCESS_KEY is set 
Thu Sep 19 10:23:49 UTC 2019 INF: export storage account - export AZURE_STORAGE_ACCOUNT=panzuredata 
Thu Sep 19 10:23:49 UTC 2019 EXEC: blobfuse /var/lib/kubelet/pods/918bc333-dac7-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/mo-uki-radar-comp-3yr --container-name=mo-uki-radar-comp-3yr --tmp-path=/tmp/mo-uki-radar-comp-3yr -o allow_other -o ro --file-cache-timeout-in-seconds=600
Thu Sep 19 10:23:49 UTC 2019 INFO: {"status": "Success"}
Thu Sep 19 10:23:49 UTC 2019 EXEC: mkdir -p /var/lib/kubelet/pods/918bc333-dac7-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/aws-earth-nc-files
Thu Sep 19 10:23:49 UTC 2019 INF: AZURE_STORAGE_ACCESS_KEY is set 
Thu Sep 19 10:23:49 UTC 2019 INF: export storage account - export AZURE_STORAGE_ACCOUNT=awsearth 
Thu Sep 19 10:23:49 UTC 2019 EXEC: blobfuse /var/lib/kubelet/pods/918bc333-dac7-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/aws-earth-nc-files --container-name=aws-earth-nc-files --tmp-path=/tmp/aws-earth-nc-files -o allow_other -o ro --file-cache-timeout-in-seconds=600
Thu Sep 19 10:23:49 UTC 2019 EXEC: mkdir -p /var/lib/kubelet/pods/918bc333-dac7-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/scratch-blob
Thu Sep 19 10:23:49 UTC 2019 INF: AZURE_STORAGE_ACCESS_KEY is set 
Thu Sep 19 10:23:49 UTC 2019 INF: export storage account - export AZURE_STORAGE_ACCOUNT=panzuredata 
Thu Sep 19 10:23:49 UTC 2019 EXEC: blobfuse /var/lib/kubelet/pods/918bc333-dac7-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/scratch-blob --container-name=scratch --tmp-path=/tmp/scratch-blob -o allow_other  --file-cache-timeout-in-seconds=600
Thu Sep 19 10:23:49 UTC 2019 INFO: {"status": "Success"}
Thu Sep 19 10:23:49 UTC 2019 INFO: {"status": "Success"}
Thu Sep 19 10:25:13 UTC 2019 EXEC: mkdir -p /var/lib/kubelet/pods/bcae3b3c-dac7-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/mo-uki-radar-comp-3yr
Thu Sep 19 10:25:13 UTC 2019 INF: AZURE_STORAGE_ACCESS_KEY is set 
Thu Sep 19 10:25:13 UTC 2019 INF: export storage account - export AZURE_STORAGE_ACCOUNT=panzuredata 
Thu Sep 19 10:25:13 UTC 2019 EXEC: blobfuse /var/lib/kubelet/pods/bcae3b3c-dac7-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/mo-uki-radar-comp-3yr --container-name=mo-uki-radar-comp-3yr --tmp-path=/tmp/mo-uki-radar-comp-3yr -o allow_other -o ro --file-cache-timeout-in-seconds=600
Thu Sep 19 10:25:14 UTC 2019 INFO: {"status": "Success"}
Thu Sep 19 10:25:14 UTC 2019 EXEC: mkdir -p /var/lib/kubelet/pods/bcae3b3c-dac7-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/scratch-blob
Thu Sep 19 10:25:14 UTC 2019 INF: AZURE_STORAGE_ACCESS_KEY is set 
Thu Sep 19 10:25:14 UTC 2019 INF: export storage account - export AZURE_STORAGE_ACCOUNT=panzuredata 
Thu Sep 19 10:25:14 UTC 2019 EXEC: blobfuse /var/lib/kubelet/pods/bcae3b3c-dac7-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/scratch-blob --container-name=scratch --tmp-path=/tmp/scratch-blob -o allow_other  --file-cache-timeout-in-seconds=600
Thu Sep 19 10:25:14 UTC 2019 INFO: {"status": "Success"}
Thu Sep 19 10:25:14 UTC 2019 EXEC: mkdir -p /var/lib/kubelet/pods/bcae3b3c-dac7-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/aws-earth-nc-files
Thu Sep 19 10:25:14 UTC 2019 INF: AZURE_STORAGE_ACCESS_KEY is set 
Thu Sep 19 10:25:14 UTC 2019 INF: export storage account - export AZURE_STORAGE_ACCOUNT=awsearth 
Thu Sep 19 10:25:14 UTC 2019 EXEC: blobfuse /var/lib/kubelet/pods/bcae3b3c-dac7-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/aws-earth-nc-files --container-name=aws-earth-nc-files --tmp-path=/tmp/aws-earth-nc-files -o allow_other -o ro --file-cache-timeout-in-seconds=600
Thu Sep 19 10:25:14 UTC 2019 INFO: {"status": "Success"}
Thu Sep 19 10:29:51 UTC 2019 EXEC: umount /var/lib/kubelet/pods/bcae3b3c-dac7-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/aws-earth-nc-files, devname: blobfuse
blobfuse
Thu Sep 19 10:29:51 UTC 2019 EXEC: umount /var/lib/kubelet/pods/bcae3b3c-dac7-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/mo-uki-radar-comp-3yr, devname: blobfuse
blobfuse
Thu Sep 19 10:29:51 UTC 2019 EXEC: rmdir /var/lib/kubelet/pods/bcae3b3c-dac7-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/aws-earth-nc-files
Thu Sep 19 10:29:51 UTC 2019 EXEC: rmdir /var/lib/kubelet/pods/bcae3b3c-dac7-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/mo-uki-radar-comp-3yr
Thu Sep 19 10:29:51 UTC 2019 INFO: {"status": "Success"}
Thu Sep 19 10:29:51 UTC 2019 INFO: {"status": "Success"}
Thu Sep 19 10:30:23 UTC 2019 EXEC: umount /var/lib/kubelet/pods/918bc333-dac7-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/scratch-blob, devname: blobfuse
blobfuse
Thu Sep 19 10:30:23 UTC 2019 EXEC: umount /var/lib/kubelet/pods/918bc333-dac7-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/mo-uki-radar-comp-3yr, devname: blobfuse
blobfuse
Thu Sep 19 10:30:23 UTC 2019 EXEC: umount /var/lib/kubelet/pods/918bc333-dac7-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/aws-earth-nc-files, devname: blobfuse
blobfuse
Thu Sep 19 10:30:23 UTC 2019 EXEC: rmdir /var/lib/kubelet/pods/918bc333-dac7-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/scratch-blob
Thu Sep 19 10:30:23 UTC 2019 INFO: {"status": "Success"}
Thu Sep 19 10:30:23 UTC 2019 EXEC: rmdir /var/lib/kubelet/pods/918bc333-dac7-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/aws-earth-nc-files
Thu Sep 19 10:30:23 UTC 2019 EXEC: rmdir /var/lib/kubelet/pods/918bc333-dac7-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/mo-uki-radar-comp-3yr
Thu Sep 19 10:30:23 UTC 2019 INFO: {"status": "Success"}
Thu Sep 19 10:30:23 UTC 2019 INFO: {"status": "Success"}
Thu Sep 19 10:30:47 UTC 2019 EXEC: mkdir -p /var/lib/kubelet/pods/89a3a078-dac8-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/mo-uki-radar-comp-3yr
Thu Sep 19 10:30:47 UTC 2019 INF: AZURE_STORAGE_ACCESS_KEY is set 
Thu Sep 19 10:30:47 UTC 2019 INF: export storage account - export AZURE_STORAGE_ACCOUNT=panzuredata 
Thu Sep 19 10:30:47 UTC 2019 EXEC: blobfuse /var/lib/kubelet/pods/89a3a078-dac8-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/mo-uki-radar-comp-3yr --container-name=mo-uki-radar-comp-3yr --tmp-path=/tmp/mo-uki-radar-comp-3yr -o allow_other -o ro --file-cache-timeout-in-seconds=600
Thu Sep 19 10:30:47 UTC 2019 EXEC: mkdir -p /var/lib/kubelet/pods/89a3a078-dac8-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/scratch-blob
Thu Sep 19 10:30:47 UTC 2019 INF: AZURE_STORAGE_ACCESS_KEY is set 
Thu Sep 19 10:30:47 UTC 2019 INF: export storage account - export AZURE_STORAGE_ACCOUNT=panzuredata 
Thu Sep 19 10:30:47 UTC 2019 EXEC: blobfuse /var/lib/kubelet/pods/89a3a078-dac8-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/scratch-blob --container-name=scratch --tmp-path=/tmp/scratch-blob -o allow_other -o ro --file-cache-timeout-in-seconds=600
Thu Sep 19 10:30:47 UTC 2019 INFO: {"status": "Success"}
Thu Sep 19 10:30:47 UTC 2019 INFO: {"status": "Success"}
Thu Sep 19 10:30:47 UTC 2019 EXEC: mkdir -p /var/lib/kubelet/pods/89a3a078-dac8-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/aws-earth-nc-files
Thu Sep 19 10:30:47 UTC 2019 INF: AZURE_STORAGE_ACCESS_KEY is set 
Thu Sep 19 10:30:47 UTC 2019 INF: export storage account - export AZURE_STORAGE_ACCOUNT=awsearth 
Thu Sep 19 10:30:47 UTC 2019 EXEC: blobfuse /var/lib/kubelet/pods/89a3a078-dac8-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/aws-earth-nc-files --container-name=aws-earth-nc-files --tmp-path=/tmp/aws-earth-nc-files -o allow_other -o ro --file-cache-timeout-in-seconds=600
Thu Sep 19 10:30:47 UTC 2019 INFO: {"status": "Success"}
Thu Sep 19 10:56:39 UTC 2019 EXEC: umount /var/lib/kubelet/pods/89a09755-dac2-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/scratch-blob, devname: blobfuse
blobfuse
Thu Sep 19 10:56:39 UTC 2019 EXEC: rmdir /var/lib/kubelet/pods/89a09755-dac2-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/scratch-blob
Thu Sep 19 10:56:39 UTC 2019 INFO: {"status": "Success"}
Thu Sep 19 11:34:24 UTC 2019 EXEC: umount /var/lib/kubelet/pods/bcae3b3c-dac7-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/scratch-blob, devname: blobfuse
blobfuse
Thu Sep 19 11:34:24 UTC 2019 EXEC: rmdir /var/lib/kubelet/pods/bcae3b3c-dac7-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/scratch-blob
Thu Sep 19 11:34:24 UTC 2019 INFO: {"status": "Success"}
Thu Sep 19 11:40:25 UTC 2019 EXEC: umount /var/lib/kubelet/pods/89a3a078-dac8-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/mo-uki-radar-comp-3yr, devname: blobfuse
blobfuse
Thu Sep 19 11:40:25 UTC 2019 EXEC: umount /var/lib/kubelet/pods/89a3a078-dac8-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/aws-earth-nc-files, devname: blobfuse
blobfuse
Thu Sep 19 11:40:25 UTC 2019 EXEC: umount /var/lib/kubelet/pods/89a3a078-dac8-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/scratch-blob, devname: blobfuse
blobfuse
Thu Sep 19 11:40:25 UTC 2019 EXEC: rmdir /var/lib/kubelet/pods/89a3a078-dac8-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/mo-uki-radar-comp-3yr
Thu Sep 19 11:40:25 UTC 2019 INFO: {"status": "Success"}
Thu Sep 19 11:40:25 UTC 2019 EXEC: rmdir /var/lib/kubelet/pods/89a3a078-dac8-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/aws-earth-nc-files
Thu Sep 19 11:40:25 UTC 2019 EXEC: rmdir /var/lib/kubelet/pods/89a3a078-dac8-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/scratch-blob
Thu Sep 19 11:40:25 UTC 2019 INFO: {"status": "Success"}
Thu Sep 19 11:40:25 UTC 2019 INFO: {"status": "Success"}
Thu Sep 19 13:19:20 UTC 2019 EXEC: mkdir -p /var/lib/kubelet/pods/16a1e0f4-dae0-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/mo-uki-radar-comp-3yr
Thu Sep 19 13:19:20 UTC 2019 INF: AZURE_STORAGE_ACCESS_KEY is set 
Thu Sep 19 13:19:20 UTC 2019 INF: export storage account - export AZURE_STORAGE_ACCOUNT=panzuredata 
Thu Sep 19 13:19:20 UTC 2019 EXEC: blobfuse /var/lib/kubelet/pods/16a1e0f4-dae0-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/mo-uki-radar-comp-3yr --container-name=mo-uki-radar-comp-3yr --tmp-path=/tmp/mo-uki-radar-comp-3yr -o allow_other -o ro --file-cache-timeout-in-seconds=600
Thu Sep 19 13:19:20 UTC 2019 EXEC: mkdir -p /var/lib/kubelet/pods/16a1e0f4-dae0-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/scratch-blob
Thu Sep 19 13:19:20 UTC 2019 INF: AZURE_STORAGE_ACCESS_KEY is set 
Thu Sep 19 13:19:20 UTC 2019 INF: export storage account - export AZURE_STORAGE_ACCOUNT=panzuredata 
Thu Sep 19 13:19:20 UTC 2019 EXEC: blobfuse /var/lib/kubelet/pods/16a1e0f4-dae0-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/scratch-blob --container-name=scratch --tmp-path=/tmp/scratch-blob -o allow_other  --file-cache-timeout-in-seconds=600
Thu Sep 19 13:19:20 UTC 2019 INFO: {"status": "Success"}
Thu Sep 19 13:19:20 UTC 2019 INFO: {"status": "Success"}
Thu Sep 19 13:19:21 UTC 2019 EXEC: mkdir -p /var/lib/kubelet/pods/16a1e0f4-dae0-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/aws-earth-nc-files
Thu Sep 19 13:19:21 UTC 2019 INF: AZURE_STORAGE_ACCESS_KEY is set 
Thu Sep 19 13:19:21 UTC 2019 INF: export storage account - export AZURE_STORAGE_ACCOUNT=awsearth 
Thu Sep 19 13:19:21 UTC 2019 EXEC: blobfuse /var/lib/kubelet/pods/16a1e0f4-dae0-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/aws-earth-nc-files --container-name=aws-earth-nc-files --tmp-path=/tmp/aws-earth-nc-files -o allow_other -o ro --file-cache-timeout-in-seconds=600
Thu Sep 19 13:19:21 UTC 2019 INFO: {"status": "Success"}
Thu Sep 19 14:38:30 UTC 2019 EXEC: mkdir -p /var/lib/kubelet/pods/25eb2058-daeb-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/aws-earth-nc-files
Thu Sep 19 14:38:30 UTC 2019 EXEC: mkdir -p /var/lib/kubelet/pods/25eb2058-daeb-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/mo-uki-radar-comp-3yr
Thu Sep 19 14:38:30 UTC 2019 INF: AZURE_STORAGE_ACCESS_KEY is set 
Thu Sep 19 14:38:30 UTC 2019 INF: export storage account - export AZURE_STORAGE_ACCOUNT=awsearth 
Thu Sep 19 14:38:30 UTC 2019 INF: AZURE_STORAGE_ACCESS_KEY is set 
Thu Sep 19 14:38:30 UTC 2019 EXEC: blobfuse /var/lib/kubelet/pods/25eb2058-daeb-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/aws-earth-nc-files --container-name=aws-earth-nc-files --tmp-path=/tmp/aws-earth-nc-files -o allow_other -o ro --file-cache-timeout-in-seconds=600
Thu Sep 19 14:38:30 UTC 2019 INF: export storage account - export AZURE_STORAGE_ACCOUNT=panzuredata 
Thu Sep 19 14:38:30 UTC 2019 EXEC: blobfuse /var/lib/kubelet/pods/25eb2058-daeb-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/mo-uki-radar-comp-3yr --container-name=mo-uki-radar-comp-3yr --tmp-path=/tmp/mo-uki-radar-comp-3yr -o allow_other -o ro --file-cache-timeout-in-seconds=600
Thu Sep 19 14:38:30 UTC 2019 EXEC: mkdir -p /var/lib/kubelet/pods/25eb2058-daeb-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/scratch-blob
Thu Sep 19 14:38:30 UTC 2019 INF: AZURE_STORAGE_ACCESS_KEY is set 
Thu Sep 19 14:38:30 UTC 2019 INF: export storage account - export AZURE_STORAGE_ACCOUNT=panzuredata 
Thu Sep 19 14:38:30 UTC 2019 EXEC: blobfuse /var/lib/kubelet/pods/25eb2058-daeb-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/scratch-blob --container-name=scratch --tmp-path=/tmp/scratch-blob -o allow_other -o ro --file-cache-timeout-in-seconds=600
Thu Sep 19 14:38:30 UTC 2019 INFO: {"status": "Success"}
Thu Sep 19 14:38:30 UTC 2019 INFO: {"status": "Success"}
Thu Sep 19 14:38:30 UTC 2019 INFO: {"status": "Success"}
Thu Sep 19 14:39:13 UTC 2019 EXEC: umount /var/lib/kubelet/pods/25eb2058-daeb-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/mo-uki-radar-comp-3yr, devname: blobfuse
blobfuse
Thu Sep 19 14:39:13 UTC 2019 EXEC: umount /var/lib/kubelet/pods/25eb2058-daeb-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/scratch-blob, devname: blobfuse
blobfuse
Thu Sep 19 14:39:13 UTC 2019 EXEC: umount /var/lib/kubelet/pods/25eb2058-daeb-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/aws-earth-nc-files, devname: blobfuse
blobfuse
Thu Sep 19 14:39:13 UTC 2019 EXEC: rmdir /var/lib/kubelet/pods/25eb2058-daeb-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/scratch-blob
Thu Sep 19 14:39:13 UTC 2019 INFO: {"status": "Success"}
Thu Sep 19 14:39:13 UTC 2019 EXEC: rmdir /var/lib/kubelet/pods/25eb2058-daeb-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/mo-uki-radar-comp-3yr
Thu Sep 19 14:39:13 UTC 2019 INFO: {"status": "Success"}
Thu Sep 19 14:39:13 UTC 2019 EXEC: rmdir /var/lib/kubelet/pods/25eb2058-daeb-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/aws-earth-nc-files
Thu Sep 19 14:39:13 UTC 2019 INFO: {"status": "Success"}
Thu Sep 19 14:39:18 UTC 2019 EXEC: mkdir -p /var/lib/kubelet/pods/42b45385-daeb-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/aws-earth-nc-files
Thu Sep 19 14:39:18 UTC 2019 INF: AZURE_STORAGE_ACCESS_KEY is set 
Thu Sep 19 14:39:18 UTC 2019 INF: export storage account - export AZURE_STORAGE_ACCOUNT=awsearth 
Thu Sep 19 14:39:18 UTC 2019 EXEC: blobfuse /var/lib/kubelet/pods/42b45385-daeb-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/aws-earth-nc-files --container-name=aws-earth-nc-files --tmp-path=/tmp/aws-earth-nc-files -o allow_other -o ro --file-cache-timeout-in-seconds=600
Thu Sep 19 14:39:19 UTC 2019 INFO: {"status": "Success"}
Thu Sep 19 14:39:19 UTC 2019 EXEC: mkdir -p /var/lib/kubelet/pods/42b45385-daeb-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/mo-uki-radar-comp-3yr
Thu Sep 19 14:39:19 UTC 2019 INF: AZURE_STORAGE_ACCESS_KEY is set 
Thu Sep 19 14:39:19 UTC 2019 INF: export storage account - export AZURE_STORAGE_ACCOUNT=panzuredata 
Thu Sep 19 14:39:19 UTC 2019 EXEC: blobfuse /var/lib/kubelet/pods/42b45385-daeb-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/mo-uki-radar-comp-3yr --container-name=mo-uki-radar-comp-3yr --tmp-path=/tmp/mo-uki-radar-comp-3yr -o allow_other -o ro --file-cache-timeout-in-seconds=600
Thu Sep 19 14:39:19 UTC 2019 INFO: {"status": "Success"}
Thu Sep 19 14:39:19 UTC 2019 EXEC: mkdir -p /var/lib/kubelet/pods/42b45385-daeb-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/scratch-blob
Thu Sep 19 14:39:19 UTC 2019 INF: AZURE_STORAGE_ACCESS_KEY is set 
Thu Sep 19 14:39:19 UTC 2019 INF: export storage account - export AZURE_STORAGE_ACCOUNT=panzuredata 
Thu Sep 19 14:39:19 UTC 2019 EXEC: blobfuse /var/lib/kubelet/pods/42b45385-daeb-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/scratch-blob --container-name=scratch --tmp-path=/tmp/scratch-blob -o allow_other -o ro --file-cache-timeout-in-seconds=600
Thu Sep 19 14:39:19 UTC 2019 INFO: {"status": "Success"}

I can't see anything obvious in there but I'm not an expert. I'm going to kill that pod the redeploy the failing version of the pod only to simplify the log and see what happens.

It's due to the race condition issue:

Thu Sep 19 10:21:51 UTC 2019 INF: AZURE_STORAGE_ACCESS_KEY is set 
Thu Sep 19 10:21:51 UTC 2019 INF: AZURE_STORAGE_ACCESS_KEY is set 
Thu Sep 19 10:21:51 UTC 2019 INF: AZURE_STORAGE_ACCESS_KEY is set 
Thu Sep 19 10:21:51 UTC 2019 INF: export storage account - export AZURE_STORAGE_ACCOUNT=panzuredata 
Thu Sep 19 10:21:51 UTC 2019 INF: export storage account - export AZURE_STORAGE_ACCOUNT=awsearth 
Thu Sep 19 10:21:51 UTC 2019 INF: export storage account - export AZURE_STORAGE_ACCOUNT=panzuredata 
Thu Sep 19 10:21:51 UTC 2019 EXEC: blobfuse /var/lib/kubelet/pods/4b9a9926-dac7-11e9-a307-9ac9506c4c50/volumes/azure~blobfuse/scratch-blob --container-name=scratch --tmp-path=/tmp/scratch-blob -o allow_other  --file-cache-timeout-in-seconds=600

I think current workaround is all blobfuse flexvolume on same pod use the same storage account, since you are using 2 storage accounts, that's the reason why sometimes it failed.

I will add mutex lock in the driver code and publish a new release.

Ok fab. That would explain why it used to work (before using an extra account). Thanks!

created a PR to fix the above: #103