Azure/kubernetes-volume-drivers

Problems with blobfuse FlexVolume driver for Kubernetes on AKS v1.18.x

ingeknudsen opened this issue · 10 comments

What happened:

We are investigating adding a feature to mount of blob storage on our platform built on top of Kubernetes running on AKS. Following the guide:
https://github.com/Azure/kubernetes-volume-drivers/tree/master/flexvolume/blobfuse

This works perfectly on AKS 1.17.11, 1.17.9, 1.16.15 and 1.16.13 (versions available on the portal). However, on 1.18.8 and 1.18.6 there is no files in the mounted volume.

What you expected to happen:

Expected it to work as well as it did for the other versions

How to reproduce it:

Bootstrap script (obfuscated some of the inputs):

az network vnet create -g clusters -n vnet-flexvolume-issue --address-prefix ## --subnet-name subnet-flexvolume-issue --subnet-prefix ## --location northeurope
az role assignment create --assignee ## --role 'Network Contributor' --scope /subscriptions/##/resourceGroups/clusters/providers/Microsoft.Network/virtualNetworks/vnet-flexvolume-issue

az network vnet peering create -g clusters -n cluster-to-hub --vnet-name vnet-flexvolume-issue --remote-vnet /subscriptions/##/resourceGroups/cluster-vnet-hub-dev/providers/Microsoft.Network/virtualNetworks/vnet-hub --allow-vnet-access
az network vnet peering create -g cluster-vnet-hub-dev -n hub-to-flexvolume-issue --vnet-name vnet-hub --remote-vnet /subscriptions/##/resourceGroups/clusters/providers/Microsoft.Network/virtualNetworks/vnet-flexvolume-issue --allow-vnet-access

az network private-dns link vnet create -g cluster-vnet-hub-dev -n flexvolume-issue-link -z privatelink.web.core.windows.net -v /subscriptions/##/resourceGroups/clusters/providers/Microsoft.Network/virtualNetworks/vnet-flexvolume-issue -e False
az network private-dns link vnet create -g cluster-vnet-hub-dev -n flexvolume-issue-link -z privatelink.postgres.database.azure.com -v /subscriptions/##/resourceGroups/clusters/providers/Microsoft.Network/virtualNetworks/vnet-flexvolume-issue -e False
az network private-dns link vnet create -g cluster-vnet-hub-dev -n flexvolume-issue-link -z privatelink.blob.core.windows.net -v /subscriptions/##/resourceGroups/clusters/providers/Microsoft.Network/virtualNetworks/vnet-flexvolume-issue -e False
az network private-dns link vnet create -g cluster-vnet-hub-dev -n flexvolume-issue-link -z privatelink.mariadb.databasLinking private DNS Zone:  privatelink.documents.azure.com to K8S VNET /subscriptions/##/resourceGroups/clusters/providers/Microsoft.Network/virtualNetworks/vnet-flexvolume-issue
az network private-dns link vnet create -g cluster-vnet-hub-dev -n flexvolume-issue-link -z privatelink.mysql.database.azure.com -v /subscriptions/##/resourceGroups/clusters/providers/Microsoft.Network/virtualNetworks/vnet-flexvolume-issue -e False
az network private-dns link vnet create -g cluster-vnet-hub-dev -n flexvolume-issue-link -z privatelink.dfs.core.windows.net -v /subscriptions/##/resourceGroups/clusters/providers/Microsoft.Network/virtualNetworks/vnet-flexvolume-issue -e False
az network private-dns link vnet create -g cluster-vnet-hub-dev -n flexvolume-issue-link -z privatelink.table.cosmos.azure.com -v /subscriptions/##/resourceGroups/clusters/providers/Microsoft.Network/virtualNetworks/vnet-flexvolume-issue -e False
az network private-dns link vnet create -g cluster-vnet-hub-dev -n flexvolume-issue-link -z privatelink.documents.azure.com -v /subscriptions/##/resourceGroups/clusters/providers/Microsoft.Network/virtualNetworks/vnet-flexvolume-issue -e False
az network private-dns link vnet create -g cluster-vnet-hub-dev -n flexvolume-issue-link -z privatelink.queue.core.windows.net -v /subscriptions/##/resourceGroups/clusters/providers/Microsoft.Network/virtualNetworks/vnet-flexvolume-issue -e False
az network private-dns link vnet create -g cluster-vnet-hub-dev -n flexvolume-issue-link -z privatelink.database.windows.net -v /subscriptions/##/resourceGroups/clusters/providers/Microsoft.Network/virtualNetworks/vnet-flexvolume-issue -e False
az network private-dns link vnet create -g cluster-vnet-hub-dev -n flexvolume-issue-link -z privatelink.gremlin.cosmos.azure.com -v /subscriptions/##/resourceGroups/clusters/providers/Microsoft.Network/virtualNetworks/vnet-flexvolume-issue -e False
az network private-dns link vnet create -g cluster-vnet-hub-dev -n flexvolume-issue-link -z privatelink.file.core.windows.net -v /subscriptions/##/resourceGroups/clusters/providers/Microsoft.Network/virtualNetworks/vnet-flexvolume-issue -e False
az network private-dns link vnet create -g cluster-vnet-hub-dev -n flexvolume-issue-link -z privatelink.mongo.cosmos.azure.com -v /subscriptions/##/resourceGroups/clusters/providers/Microsoft.Network/virtualNetworks/vnet-flexvolume-issue -e False

az aks create 
    --resource-group clusters 
    --name flexvolume-issue     
    --no-ssh-key     
    --kubernetes-version 1.18.8     
    --service-principal ##     
    --client-secret ##     
    --node-count 3     
    --node-osdisk-size 512     
    --node-vm-size Standard_DS4_v2     
    --max-pods 110     
    --network-plugin azure     
    --network-policy calico     
    --vnet-subnet-id /subscriptions/##/resourceGroups/clusters/providers/Microsoft.Network/virtualNetworks/vnet-flexvolume-issue/subnets/subnet-flexvolume-issue     
    --docker-bridge-address ##  
    --dns-service-ip ##     
    --service-cidr ##     
    --aad-server-app-id ##     
    --aad-server-app-secret ##     
    --aad-client-app-id ##     
    --aad-tenant-id ##
    --location northeurope

Anything else we need to know?:

kubectl describe po blobfuse-flexvol-installer-cnbtb -n kube-system | grep blobfuse-flexvolume
Image:          mcr.microsoft.com/k8s/flexvolume/blobfuse-flexvolume:1.0.13
Image ID:       docker-pullable://mcr.microsoft.com/k8s/flexvolume/blobfuse-flexvolume@sha256:d550ef47c218e7eb4467bf22b412a0bc20cdd7b261325360529944e8e212d833
sudo cat /var/log/blobfuse-flexvol-installer.log
begin to install blobfuse FlexVolume driver 1.0.13, target dir:/etc/kubernetes/volumeplugins ...
install blobfuse FlexVolume driver completed.
sudo cat /var/log/blobfuse-driver.log
ENV Path: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin
Wed Oct 7 18:05:55 UTC 2020 INFO: {"status": "Success", "capabilities": {"attach": false, "fsGroup": false}}
blobfuse test --container-name=## --tmp-path=/tmp/blobfuse -o allow_other --file-cache-timeout-in-seconds=120
fusermount: option allow_other only allowed if 'user_allow_other' is set in /etc/fuse.conf

Environment:

  • Kubernetes version (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.4", GitCommit:"c96aede7b5205121079932896c4ad89bb93260af", GitTreeState:"clean", BuildDate:"2020-06-18T02:59:13Z", GoVersion:"go1.14.3", Compiler:"gc", Platform:"darwin/amd64"}
    Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.8", GitCommit:"9f2892aab98fe339f3bd70e3c470144299398ace", GitTreeState:"clean", BuildDate:"2020-08-14T00:06:38Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

  • OS (e.g. from /etc/os-release):
    NAME="Ubuntu"
    VERSION="18.04.5 LTS (Bionic Beaver)"
    ID=ubuntu
    ID_LIKE=debian
    PRETTY_NAME="Ubuntu 18.04.5 LTS"
    VERSION_ID="18.04"
    HOME_URL="https://www.ubuntu.com/"
    SUPPORT_URL="https://help.ubuntu.com/"
    BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
    PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
    VERSION_CODENAME=bionic
    UBUNTU_CODENAME=bionic

  • Kernel (e.g. uname -a):
    Linux aks-nodepool1-52800181-vmss000000 5.4.0-1026-azure #26~18.04.1-Ubuntu SMP Thu Sep 10 16:19:25 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

could you kubectl-enter the node, and check blobfuse --version? I think it's due to higher blobfuse version. downgrade to 1.1.1 version may help:

kubectl apply -f https://raw.githubusercontent.com/Azure/kubernetes-volume-drivers/master/flexvolume/blobfuse/binary/sysctl-install-blobfuse-1.1.1.yaml

also is it possible to use https://github.com/kubernetes-sigs/blob-csi-driver?

@NaraVen do you know which blobfuse version bring this breaking change? Thanks.

blobfuse test --container-name=## --tmp-path=/tmp/blobfuse -o allow_other --file-cache-timeout-in-seconds=120
fusermount: option allow_other only allowed if 'user_allow_other' is set in /etc/fuse.conf
azureuser@aks-nodepool1-########-vmss000000:/etc$ blobfuse --version
blobfuse 1.0.3

Wouldn't the blobfuse version follow mcr.microsoft.com/k8s/flexvolume/blobfuse-flexvolume:1.0.13?

On a cluster which this works. How come the blobfuse version is older on the 1.18 cluster?:

Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.4", GitCommit:"c96aede7b5205121079932896c4ad89bb93260af", GitTreeState:"clean", BuildDate:"2020-06-18T02:59:13Z", GoVersion:"go1.14.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.11", GitCommit:"b93345fce4b93610948e30f12c767f5dabd3d570", GitTreeState:"clean", BuildDate:"2020-08-24T20:09:56Z", GoVersion:"go1.12.17", Compiler:"gc", Platform:"linux/amd64"}

azureuser@aks-nodepool1-########-vmss000000:~$ blobfuse --version
blobfuse 1.2.3

On my AKS 1.18.6 node, allow_other option works:

# k get no -o wide
NAME                                STATUS   ROLES   AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
aks-nodepool1-13451060-vmss00000a   Ready    agent   30h   v1.18.6   10.240.0.4    <none>        Ubuntu 18.04.5 LTS   5.3.0-1034-azure   docker://3.0.10+azure
aks-nodepool1-13451060-vmss00000b   Ready    agent   30h   v1.18.6   10.240.0.5    <none>        Ubuntu 18.04.5 LTS   5.3.0-1034-azure   docker://3.0.10+azure

# blobfuse test --container-name=pvc-1b94cbd0-4349-4557-9adf-3243c69053f0 --tmp-path=/tmp/blobfuse -o allow_other --file-cache-timeout-in-seconds=120
root@aks-nodepool1-13451060-vmss00000A:/# uname -a
Linux aks-nodepool1-13451060-vmss00000A 5.3.0-1034-azure #35~18.04.1-Ubuntu SMP Mon Jul 13 12:54:45 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

# blobfuse --version
blobfuse 1.0.3

To simply workaround, try

kubectl apply -f https://raw.githubusercontent.com/Azure/kubernetes-volume-drivers/master/flexvolume/blobfuse/binary/sysctl-install-blobfuse-1.2.4.yaml

The blobfuse binary is not built-in with mcr.microsoft.com/k8s/flexvolume/blobfuse-flexvolume:1.0.13 now.

That did the trick 😄 . Is that a permanent workaround you recommend us doing on every bootstrap of a new cluster, or is this a bug with the 1.18.x version which we should expect to be fixed at some point in time?

Using blobfuse 1.2.4 fixes this empty mount issue, but leads to problems when pods share the same PV running on the same node.
Error: FailedMount ... MountVolume.SetUp failed for volume ... : invalid character 'E' looking for beginning of value

already upgraded to blobfuse 1.3.5 in next AKS release: Azure/AgentBaker#439
@ggrunin about your issue, try following workaround(it will install blobfuse 1.3.5 and flexvolume driver 1.0.16):

kubectl apply -f https://raw.githubusercontent.com/Azure/kubernetes-volume-drivers/master/flexvolume/blobfuse/binary/sysctl-install-blobfuse-1.3.5.yaml
kubectl apply -f https://raw.githubusercontent.com/Azure/kubernetes-volume-drivers/master/flexvolume/blobfuse/deployment/blobfuse-flexvol-installer-1.9.yaml

@andyzhangx Thanks! It helped!