Problems with blobfuse FlexVolume driver for Kubernetes on AKS v1.18.x
ingeknudsen opened this issue · 10 comments
What happened:
We are investigating adding a feature to mount of blob storage on our platform built on top of Kubernetes running on AKS. Following the guide:
https://github.com/Azure/kubernetes-volume-drivers/tree/master/flexvolume/blobfuse
This works perfectly on AKS 1.17.11, 1.17.9, 1.16.15 and 1.16.13 (versions available on the portal). However, on 1.18.8 and 1.18.6 there is no files in the mounted volume.
What you expected to happen:
Expected it to work as well as it did for the other versions
How to reproduce it:
Bootstrap script (obfuscated some of the inputs):
az network vnet create -g clusters -n vnet-flexvolume-issue --address-prefix ## --subnet-name subnet-flexvolume-issue --subnet-prefix ## --location northeurope
az role assignment create --assignee ## --role 'Network Contributor' --scope /subscriptions/##/resourceGroups/clusters/providers/Microsoft.Network/virtualNetworks/vnet-flexvolume-issue
az network vnet peering create -g clusters -n cluster-to-hub --vnet-name vnet-flexvolume-issue --remote-vnet /subscriptions/##/resourceGroups/cluster-vnet-hub-dev/providers/Microsoft.Network/virtualNetworks/vnet-hub --allow-vnet-access
az network vnet peering create -g cluster-vnet-hub-dev -n hub-to-flexvolume-issue --vnet-name vnet-hub --remote-vnet /subscriptions/##/resourceGroups/clusters/providers/Microsoft.Network/virtualNetworks/vnet-flexvolume-issue --allow-vnet-access
az network private-dns link vnet create -g cluster-vnet-hub-dev -n flexvolume-issue-link -z privatelink.web.core.windows.net -v /subscriptions/##/resourceGroups/clusters/providers/Microsoft.Network/virtualNetworks/vnet-flexvolume-issue -e False
az network private-dns link vnet create -g cluster-vnet-hub-dev -n flexvolume-issue-link -z privatelink.postgres.database.azure.com -v /subscriptions/##/resourceGroups/clusters/providers/Microsoft.Network/virtualNetworks/vnet-flexvolume-issue -e False
az network private-dns link vnet create -g cluster-vnet-hub-dev -n flexvolume-issue-link -z privatelink.blob.core.windows.net -v /subscriptions/##/resourceGroups/clusters/providers/Microsoft.Network/virtualNetworks/vnet-flexvolume-issue -e False
az network private-dns link vnet create -g cluster-vnet-hub-dev -n flexvolume-issue-link -z privatelink.mariadb.databasLinking private DNS Zone: privatelink.documents.azure.com to K8S VNET /subscriptions/##/resourceGroups/clusters/providers/Microsoft.Network/virtualNetworks/vnet-flexvolume-issue
az network private-dns link vnet create -g cluster-vnet-hub-dev -n flexvolume-issue-link -z privatelink.mysql.database.azure.com -v /subscriptions/##/resourceGroups/clusters/providers/Microsoft.Network/virtualNetworks/vnet-flexvolume-issue -e False
az network private-dns link vnet create -g cluster-vnet-hub-dev -n flexvolume-issue-link -z privatelink.dfs.core.windows.net -v /subscriptions/##/resourceGroups/clusters/providers/Microsoft.Network/virtualNetworks/vnet-flexvolume-issue -e False
az network private-dns link vnet create -g cluster-vnet-hub-dev -n flexvolume-issue-link -z privatelink.table.cosmos.azure.com -v /subscriptions/##/resourceGroups/clusters/providers/Microsoft.Network/virtualNetworks/vnet-flexvolume-issue -e False
az network private-dns link vnet create -g cluster-vnet-hub-dev -n flexvolume-issue-link -z privatelink.documents.azure.com -v /subscriptions/##/resourceGroups/clusters/providers/Microsoft.Network/virtualNetworks/vnet-flexvolume-issue -e False
az network private-dns link vnet create -g cluster-vnet-hub-dev -n flexvolume-issue-link -z privatelink.queue.core.windows.net -v /subscriptions/##/resourceGroups/clusters/providers/Microsoft.Network/virtualNetworks/vnet-flexvolume-issue -e False
az network private-dns link vnet create -g cluster-vnet-hub-dev -n flexvolume-issue-link -z privatelink.database.windows.net -v /subscriptions/##/resourceGroups/clusters/providers/Microsoft.Network/virtualNetworks/vnet-flexvolume-issue -e False
az network private-dns link vnet create -g cluster-vnet-hub-dev -n flexvolume-issue-link -z privatelink.gremlin.cosmos.azure.com -v /subscriptions/##/resourceGroups/clusters/providers/Microsoft.Network/virtualNetworks/vnet-flexvolume-issue -e False
az network private-dns link vnet create -g cluster-vnet-hub-dev -n flexvolume-issue-link -z privatelink.file.core.windows.net -v /subscriptions/##/resourceGroups/clusters/providers/Microsoft.Network/virtualNetworks/vnet-flexvolume-issue -e False
az network private-dns link vnet create -g cluster-vnet-hub-dev -n flexvolume-issue-link -z privatelink.mongo.cosmos.azure.com -v /subscriptions/##/resourceGroups/clusters/providers/Microsoft.Network/virtualNetworks/vnet-flexvolume-issue -e False
az aks create
--resource-group clusters
--name flexvolume-issue
--no-ssh-key
--kubernetes-version 1.18.8
--service-principal ##
--client-secret ##
--node-count 3
--node-osdisk-size 512
--node-vm-size Standard_DS4_v2
--max-pods 110
--network-plugin azure
--network-policy calico
--vnet-subnet-id /subscriptions/##/resourceGroups/clusters/providers/Microsoft.Network/virtualNetworks/vnet-flexvolume-issue/subnets/subnet-flexvolume-issue
--docker-bridge-address ##
--dns-service-ip ##
--service-cidr ##
--aad-server-app-id ##
--aad-server-app-secret ##
--aad-client-app-id ##
--aad-tenant-id ##
--location northeurope
Anything else we need to know?:
kubectl describe po blobfuse-flexvol-installer-cnbtb -n kube-system | grep blobfuse-flexvolume
Image: mcr.microsoft.com/k8s/flexvolume/blobfuse-flexvolume:1.0.13
Image ID: docker-pullable://mcr.microsoft.com/k8s/flexvolume/blobfuse-flexvolume@sha256:d550ef47c218e7eb4467bf22b412a0bc20cdd7b261325360529944e8e212d833
sudo cat /var/log/blobfuse-flexvol-installer.log
begin to install blobfuse FlexVolume driver 1.0.13, target dir:/etc/kubernetes/volumeplugins ...
install blobfuse FlexVolume driver completed.
sudo cat /var/log/blobfuse-driver.log
ENV Path: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin
Wed Oct 7 18:05:55 UTC 2020 INFO: {"status": "Success", "capabilities": {"attach": false, "fsGroup": false}}
blobfuse test --container-name=## --tmp-path=/tmp/blobfuse -o allow_other --file-cache-timeout-in-seconds=120
fusermount: option allow_other only allowed if 'user_allow_other' is set in /etc/fuse.conf
Environment:
-
Kubernetes version (use
kubectl version
):
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.4", GitCommit:"c96aede7b5205121079932896c4ad89bb93260af", GitTreeState:"clean", BuildDate:"2020-06-18T02:59:13Z", GoVersion:"go1.14.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.8", GitCommit:"9f2892aab98fe339f3bd70e3c470144299398ace", GitTreeState:"clean", BuildDate:"2020-08-14T00:06:38Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"} -
OS (e.g. from /etc/os-release):
NAME="Ubuntu"
VERSION="18.04.5 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.5 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic -
Kernel (e.g.
uname -a
):
Linux aks-nodepool1-52800181-vmss000000 5.4.0-1026-azure #26~18.04.1-Ubuntu SMP Thu Sep 10 16:19:25 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
could you kubectl-enter
the node, and check blobfuse --version
? I think it's due to higher blobfuse version. downgrade to 1.1.1 version may help:
kubectl apply -f https://raw.githubusercontent.com/Azure/kubernetes-volume-drivers/master/flexvolume/blobfuse/binary/sysctl-install-blobfuse-1.1.1.yaml
also is it possible to use https://github.com/kubernetes-sigs/blob-csi-driver?
@NaraVen do you know which blobfuse version bring this breaking change? Thanks.
blobfuse test --container-name=## --tmp-path=/tmp/blobfuse -o allow_other --file-cache-timeout-in-seconds=120
fusermount: option allow_other only allowed if 'user_allow_other' is set in /etc/fuse.conf
azureuser@aks-nodepool1-########-vmss000000:/etc$ blobfuse --version
blobfuse 1.0.3
Wouldn't the blobfuse version follow mcr.microsoft.com/k8s/flexvolume/blobfuse-flexvolume:1.0.13?
On a cluster which this works. How come the blobfuse version is older on the 1.18 cluster?:
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.4", GitCommit:"c96aede7b5205121079932896c4ad89bb93260af", GitTreeState:"clean", BuildDate:"2020-06-18T02:59:13Z", GoVersion:"go1.14.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.11", GitCommit:"b93345fce4b93610948e30f12c767f5dabd3d570", GitTreeState:"clean", BuildDate:"2020-08-24T20:09:56Z", GoVersion:"go1.12.17", Compiler:"gc", Platform:"linux/amd64"}
azureuser@aks-nodepool1-########-vmss000000:~$ blobfuse --version
blobfuse 1.2.3
On my AKS 1.18.6
node, allow_other
option works:
# k get no -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
aks-nodepool1-13451060-vmss00000a Ready agent 30h v1.18.6 10.240.0.4 <none> Ubuntu 18.04.5 LTS 5.3.0-1034-azure docker://3.0.10+azure
aks-nodepool1-13451060-vmss00000b Ready agent 30h v1.18.6 10.240.0.5 <none> Ubuntu 18.04.5 LTS 5.3.0-1034-azure docker://3.0.10+azure
# blobfuse test --container-name=pvc-1b94cbd0-4349-4557-9adf-3243c69053f0 --tmp-path=/tmp/blobfuse -o allow_other --file-cache-timeout-in-seconds=120
root@aks-nodepool1-13451060-vmss00000A:/# uname -a
Linux aks-nodepool1-13451060-vmss00000A 5.3.0-1034-azure #35~18.04.1-Ubuntu SMP Mon Jul 13 12:54:45 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
# blobfuse --version
blobfuse 1.0.3
To simply workaround, try
kubectl apply -f https://raw.githubusercontent.com/Azure/kubernetes-volume-drivers/master/flexvolume/blobfuse/binary/sysctl-install-blobfuse-1.2.4.yaml
The blobfuse binary is not built-in with mcr.microsoft.com/k8s/flexvolume/blobfuse-flexvolume:1.0.13
now.
That did the trick 😄 . Is that a permanent workaround you recommend us doing on every bootstrap of a new cluster, or is this a bug with the 1.18.x version which we should expect to be fixed at some point in time?
Using blobfuse 1.2.4 fixes this empty mount issue, but leads to problems when pods share the same PV running on the same node.
Error: FailedMount ... MountVolume.SetUp failed for volume ... : invalid character 'E' looking for beginning of value
already upgraded to blobfuse 1.3.5 in next AKS release: Azure/AgentBaker#439
@ggrunin about your issue, try following workaround(it will install blobfuse 1.3.5
and flexvolume driver 1.0.16
):
kubectl apply -f https://raw.githubusercontent.com/Azure/kubernetes-volume-drivers/master/flexvolume/blobfuse/binary/sysctl-install-blobfuse-1.3.5.yaml
kubectl apply -f https://raw.githubusercontent.com/Azure/kubernetes-volume-drivers/master/flexvolume/blobfuse/deployment/blobfuse-flexvol-installer-1.9.yaml
@andyzhangx Thanks! It helped!