Azure/kubernetes-volume-drivers

Failure in mounting PVC for Azure Blob Storage using Blobfuse when scaling out to more than 1 pods in AKS

ankurkapoor opened this issue · 2 comments

What happened:
I am using AKS single node cluster to host blob storage mount via blobfuse. There is a reconstruct service that when i run on 1 replica works fine but fails when i am trying to scale it out to any number of pods. During scale out, all pods except 1 fails to start with below error.

MountVolume.SetUp failed for volume "bv-blobfuse-flexvol" : invalid character 'E' looking for beginning of value

Below is the log from blobfuse-driver.log file

Wed Jul 1 10:47:36 UTC 2020 EXEC: mkdir -p /var/lib/kubelet/pods/1066ed68-df0f-423b-a1d6-2694e07550d7/volumes/azureblobfuse/bv-blobfuse-flexvol
Wed Jul 1 10:47:36 UTC 2020 INF: AZURE_STORAGE_ACCESS_KEY is set
Wed Jul 1 10:47:36 UTC 2020 INF: export storage account - export AZURE_STORAGE_ACCOUNT=bvdevstr
Wed Jul 1 10:47:36 UTC 2020 EXEC: blobfuse /var/lib/kubelet/pods/1066ed68-df0f-423b-a1d6-2694e07550d7/volumes/azure
blobfuse/bv-blobfuse-flexvol --container-name=biovolumeisetstore --tmp-path=/tmp/blobfuse -o allow_other --file-cache-timeout-in-seconds=0
Wed Jul 1 10:47:36 UTC 2020 ERROR: { "status": "Failure", "message": "Failed to mount device /dev/ at /var/lib/kubelet/pods/1066ed68-df0f-423b-a1d6-2694e07550d7/volumes/azureblobfuse/bv-blobfuse-flexvol, accountname:bvdevstr, error log:Wed Jul 1 10:47:36 UTC 2020 EXEC: blobfuse /var/lib/kubelet/pods/1066ed68-df0f-423b-a1d6-2694e07550d7/volumes/azureblobfuse/bv-blobfuse-flexvol --container-name=biovolumeisetstore --tmp-path=/tmp/blobfuse -o allow_other --file-cache-timeout-in-seconds=0" }

What you expected to happen:

As a default pod scaling behaviour, i would like to see all the replica pods to start normally (just like the first pod) without any errors with correct mount setup via blobfuse.

How to reproduce it:

On an existing AKS cluster with installed Bolbfuse (using demonset), setup the attached pvcblobfuse.txt for PV and PVC.
Then use any existing image to setup deployment using poddeployment.txt to create 2 replicas of the service.

The first pod will start without any issues but the 2nd one will fail to start.

Anything else we need to know?:
I have raised this issue at the Blobfuse repo as well (Azure/azure-storage-fuse#430) but comment suggested that to be more drivers related issue.

Environment:

  • Kubernetes version (use kubectl version):
    lient Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.6-beta.0", GitCommit:"e7f962ba86f4ce7033828210ca3556393c377bcc", GitTreeState:"clean", BuildDate:"2020-01-15T08:26:26Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"windows/amd64"}
    Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.11", GitCommit:"ec831747a3a5896dbdf53f259eafea2a2595217c", GitTreeState:"clean", BuildDate:"2020-05-29T19:56:10Z", GoVersion:"go1.12.17", Compiler:"gc", Platform:"linux/amd64"}

  • OS (e.g. from /etc/os-release):
    NAME="Ubuntu"
    VERSION="16.04.6 LTS (Xenial Xerus)"
    ID=ubuntu
    ID_LIKE=debian
    PRETTY_NAME="Ubuntu 16.04.6 LTS"
    VERSION_ID="16.04"

  • Kernel (e.g. uname -a):Linux aks-nodepool1-41959266-vmss000000 4.15.0-1083-azure #93~16.04.1-Ubuntu SMP Thu May 7 18:39:44 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

  • Install tools: no additional tools installed.

  • Others:
    blobfuse 1.2.3
    Image: mcr.microsoft.com/k8s/flexvolume/blobfuse-flexvolume:1.0.13
    Image ID: docker-pullable://mcr.microsoft.com/k8s/flexvolume/blobfuse-flexvolume@sha256:d550ef47c218e7eb4467bf22b412a0bc20cdd7b261325360529944e8e212d833

poddeployment.txt
pvcblobfuse.txt

we are seeing very similar behaviour using AKS and blob storage via blobfuse

we are seeing very similar behaviour using AKS and blob storage via blobfuse

The issue is due to same --tmp-path=/tmp/blobfuse on the two PVs, pls follow up with the workaround mentioned in #66 (comment)

in long term, pls try Azure Blob Storage CSI driver, we also support NFSv3 protocol on latest v0.7.0 version.