Failure in mounting PVC for Azure Blob Storage using Blobfuse when scaling out to more than 1 pods in AKS
ankurkapoor opened this issue · 2 comments
What happened:
I am using AKS single node cluster to host blob storage mount via blobfuse. There is a reconstruct service that when i run on 1 replica works fine but fails when i am trying to scale it out to any number of pods. During scale out, all pods except 1 fails to start with below error.
MountVolume.SetUp failed for volume "bv-blobfuse-flexvol" : invalid character 'E' looking for beginning of value
Below is the log from blobfuse-driver.log file
Wed Jul 1 10:47:36 UTC 2020 EXEC: mkdir -p /var/lib/kubelet/pods/1066ed68-df0f-423b-a1d6-2694e07550d7/volumes/azureblobfuse/bv-blobfuse-flexvolblobfuse/bv-blobfuse-flexvol --container-name=biovolumeisetstore --tmp-path=/tmp/blobfuse -o allow_other --file-cache-timeout-in-seconds=0
Wed Jul 1 10:47:36 UTC 2020 INF: AZURE_STORAGE_ACCESS_KEY is set
Wed Jul 1 10:47:36 UTC 2020 INF: export storage account - export AZURE_STORAGE_ACCOUNT=bvdevstr
Wed Jul 1 10:47:36 UTC 2020 EXEC: blobfuse /var/lib/kubelet/pods/1066ed68-df0f-423b-a1d6-2694e07550d7/volumes/azure
Wed Jul 1 10:47:36 UTC 2020 ERROR: { "status": "Failure", "message": "Failed to mount device /dev/ at /var/lib/kubelet/pods/1066ed68-df0f-423b-a1d6-2694e07550d7/volumes/azureblobfuse/bv-blobfuse-flexvol, accountname:bvdevstr, error log:Wed Jul 1 10:47:36 UTC 2020 EXEC: blobfuse /var/lib/kubelet/pods/1066ed68-df0f-423b-a1d6-2694e07550d7/volumes/azureblobfuse/bv-blobfuse-flexvol --container-name=biovolumeisetstore --tmp-path=/tmp/blobfuse -o allow_other --file-cache-timeout-in-seconds=0" }
What you expected to happen:
As a default pod scaling behaviour, i would like to see all the replica pods to start normally (just like the first pod) without any errors with correct mount setup via blobfuse.
How to reproduce it:
On an existing AKS cluster with installed Bolbfuse (using demonset), setup the attached pvcblobfuse.txt for PV and PVC.
Then use any existing image to setup deployment using poddeployment.txt to create 2 replicas of the service.
The first pod will start without any issues but the 2nd one will fail to start.
Anything else we need to know?:
I have raised this issue at the Blobfuse repo as well (Azure/azure-storage-fuse#430) but comment suggested that to be more drivers related issue.
Environment:
-
Kubernetes version (use
kubectl version
):
lient Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.6-beta.0", GitCommit:"e7f962ba86f4ce7033828210ca3556393c377bcc", GitTreeState:"clean", BuildDate:"2020-01-15T08:26:26Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"windows/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.11", GitCommit:"ec831747a3a5896dbdf53f259eafea2a2595217c", GitTreeState:"clean", BuildDate:"2020-05-29T19:56:10Z", GoVersion:"go1.12.17", Compiler:"gc", Platform:"linux/amd64"} -
OS (e.g. from /etc/os-release):
NAME="Ubuntu"
VERSION="16.04.6 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.6 LTS"
VERSION_ID="16.04" -
Kernel (e.g.
uname -a
):Linux aks-nodepool1-41959266-vmss000000 4.15.0-1083-azure #93~16.04.1-Ubuntu SMP Thu May 7 18:39:44 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux -
Install tools: no additional tools installed.
-
Others:
blobfuse 1.2.3
Image: mcr.microsoft.com/k8s/flexvolume/blobfuse-flexvolume:1.0.13
Image ID: docker-pullable://mcr.microsoft.com/k8s/flexvolume/blobfuse-flexvolume@sha256:d550ef47c218e7eb4467bf22b412a0bc20cdd7b261325360529944e8e212d833
we are seeing very similar behaviour using AKS and blob storage via blobfuse
we are seeing very similar behaviour using AKS and blob storage via blobfuse
The issue is due to same --tmp-path=/tmp/blobfuse
on the two PVs, pls follow up with the workaround mentioned in #66 (comment)
in long term, pls try Azure Blob Storage CSI driver, we also support NFSv3 protocol on latest v0.7.0 version.