Azure/kubernetes-volume-drivers

Error mounting blob storage inside Kubeflow Notebook

szinck1 opened this issue · 12 comments

I'm attempting to mount blob storage inside a Kubeflow Jupyter Notebook. I extend the yaml that creates the Notebook (a Custom Resource) and I get the following error... the same config works fine for a regular Pod. I'm having trouble tracking down where the error message MountVolume.SetUp failed for volume "test" : mount command failed, status: Failure, reason: validation failed, error log:container is empty comes from and what exactly it means. Also below is the config I'm using for those interested.

Appreciate any help. thanks.

14s         Warning   FailedMount             pod/testblobmount-0                             MountVolume.SetUp failed for volume "test" : mount command failed, status: Failure, reason: validation failed, error log:container is empty
29s         Warning   FailedScheduling        pod/testblobmount-0                             pod has unbound immediate PersistentVolumeClaims (repeated 3 times)
28s         Normal    Scheduled               pod/testblobmount-0                             Successfully assigned dsd/testblobmount-0 to aks-nodepool1-34358289-vmss000002
29s         Normal    SuccessfulCreate        statefulset/testblobmount                       create Pod testblobmount-0 in StatefulSet testblobmount successful
29s         Normal    ProvisioningSucceeded   persistentvolumeclaim/workspace-testblobmount   Successfully provisioned volume pvc-acb4dc9d-ad50-47f6-97c6-917bfdce83a9 using kubernetes.io/azure-disk
apiVersion: kubeflow.org/v1alpha1
kind: Notebook
metadata:
  name: {name}
  namespace: {namespace}
  labels:
    app: {name}
spec:
  template:
    spec:
      serviceAccountName: {serviceAccount}
      containers:
        - name: {name}
          image: ""
          volumeMounts: []
          env: []
          resources:
            requests:
              cpu: "0.1"
              memory: "0.1Gi"
      ttlSecondsAfterFinished: 300
      volumes:
      - name: test
        flexVolume:
          driver: "azure/blobfuse"
          readOnly: false
          secretRef:
            name: blobfusecreds
        options:
          container: blobtest
          tmppath: /tmp/blobfuse

Your indent could be wrong, it should be:

      volumes:
      - name: test
        flexVolume:
          driver: "azure/blobfuse"
          readOnly: false
          secretRef:
            name: blobfusecreds
          options:
            container: blobtest
            tmppath: /tmp/blobfuse

another possible root cause: jq package is not installed on your agent node.

btw, I would suggest you to use bobfuse CSI driver:
https://github.com/csi-driver/blobfuse-csi-driver
Original blobfuse flex volume driver is now in maintenance mode.

Thanks.. I've uninstalled the flux driver and installed the CSI driver.
What is supposed to be in volumeHandle? The default is volumeHandle: arbitrary-volumeid which doesn't work. The code looks to me like it's supposed to be "#resourcename#pvname" but that isn't working for me either.

62s Warning FailedMount pod/nginx-blobfuse MountVolume.MountDevice failed for volume "pv-blobfuse" : rpc error: code = Unknown desc = error parsing volume id: "arbitrary-volumeid", should at least contain two #

And with my resource group and pv name:

7m42s Warning FailedMount pod/nginx-blobfuse MountVolume.MountDevice failed for volume "pv-blobfuse" : rpc error: code = Unknown desc = no key for storage account(our-storage-account-name) under resource group(rg), err Retriable: true, RetryAfter: 0s, HTTPStatusCode: -1, RawError: storage.AccountsClient#ListKeys: Invalid input: autorest/validation: validation failed: parameter=accountName constraint=MaxLength value="our-storage-account-name" details: value length must be less than or equal to 24

if your service principal has access to the storage account, you could try this storage class:
https://github.com/kubernetes-sigs/blobfuse-csi-driver/blob/master/deploy/example/storageclass-blobfuse-csi-existing-container.yaml

Using the SP works, however I wasn't able to get the CSI driver working with a secret.

SP works in the same RG that Kubernetes is running in, but not in another resource group. Are there any constraints in terms of RG location? Are more permissions beyond "Storage Contributor" required for the SP?

Using the SP works, however I wasn't able to get the CSI driver working with a secret.

Could you use nodeStageSecretRef in PV config, there is a breaking change in master branch, need to update the doc:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-azurefile
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain #If set as "Delete" file share would be removed after pvc deletion
  csi:
    driver: file.csi.azure.com
    readOnly: false
    volumeHandle: arbitrary-volumeid
    volumeAttributes:
      shareName: EXISTING_FILE_SHARE_NAME  #only file share name, don't use full path, e.g. https://storageaccount.file.core.windows.net/filesharename
    nodeStageSecretRef:
      name: azure-secret
      namespace: default

If your SP has access to other RG, then it should also work.
"Storage Contributor" related permissions should be granted to your SP.

FYI. kubernetes-sigs/azurefile-csi-driver#161 fixed the PV config(with secret) doc issue.

I'm using blob not file storage. Is this still relevant?

I'm using blob not file storage. Is this still relevant?

yes, blobfuse should also use nodeStageSecretRef, related doc PR: kubernetes-sigs/blob-csi-driver#108

@andyzhangx

Based on my experimenting, it seems that the Resource Group and Storage Account need to be in the same region as Kubernetes. It also needs to be in the same subscription (maybe not surprising, but it tripped me up because I didn't realize the storage account I was given was in a diff subscription)

I'm using nodeStageSecretRef.

Thanks for your help!