Error mounting blob storage inside Kubeflow Notebook
szinck1 opened this issue · 12 comments
I'm attempting to mount blob storage inside a Kubeflow Jupyter Notebook. I extend the yaml that creates the Notebook (a Custom Resource) and I get the following error... the same config works fine for a regular Pod. I'm having trouble tracking down where the error message MountVolume.SetUp failed for volume "test" : mount command failed, status: Failure, reason: validation failed, error log:container is empty
comes from and what exactly it means. Also below is the config I'm using for those interested.
Appreciate any help. thanks.
14s Warning FailedMount pod/testblobmount-0 MountVolume.SetUp failed for volume "test" : mount command failed, status: Failure, reason: validation failed, error log:container is empty
29s Warning FailedScheduling pod/testblobmount-0 pod has unbound immediate PersistentVolumeClaims (repeated 3 times)
28s Normal Scheduled pod/testblobmount-0 Successfully assigned dsd/testblobmount-0 to aks-nodepool1-34358289-vmss000002
29s Normal SuccessfulCreate statefulset/testblobmount create Pod testblobmount-0 in StatefulSet testblobmount successful
29s Normal ProvisioningSucceeded persistentvolumeclaim/workspace-testblobmount Successfully provisioned volume pvc-acb4dc9d-ad50-47f6-97c6-917bfdce83a9 using kubernetes.io/azure-disk
apiVersion: kubeflow.org/v1alpha1
kind: Notebook
metadata:
name: {name}
namespace: {namespace}
labels:
app: {name}
spec:
template:
spec:
serviceAccountName: {serviceAccount}
containers:
- name: {name}
image: ""
volumeMounts: []
env: []
resources:
requests:
cpu: "0.1"
memory: "0.1Gi"
ttlSecondsAfterFinished: 300
volumes:
- name: test
flexVolume:
driver: "azure/blobfuse"
readOnly: false
secretRef:
name: blobfusecreds
options:
container: blobtest
tmppath: /tmp/blobfuse
Your indent could be wrong, it should be:
volumes:
- name: test
flexVolume:
driver: "azure/blobfuse"
readOnly: false
secretRef:
name: blobfusecreds
options:
container: blobtest
tmppath: /tmp/blobfuse
another possible root cause: jq
package is not installed on your agent node.
btw, I would suggest you to use bobfuse CSI driver:
https://github.com/csi-driver/blobfuse-csi-driver
Original blobfuse flex volume driver is now in maintenance mode.
Thanks.. I've uninstalled the flux driver and installed the CSI driver.
What is supposed to be in volumeHandle
? The default is volumeHandle: arbitrary-volumeid
which doesn't work. The code looks to me like it's supposed to be "#resourcename#pvname" but that isn't working for me either.
62s Warning FailedMount pod/nginx-blobfuse MountVolume.MountDevice failed for volume "pv-blobfuse" : rpc error: code = Unknown desc = error parsing volume id: "arbitrary-volumeid", should at least contain two #
And with my resource group and pv name:
7m42s Warning FailedMount pod/nginx-blobfuse MountVolume.MountDevice failed for volume "pv-blobfuse" : rpc error: code = Unknown desc = no key for storage account(our-storage-account-name) under resource group(rg), err Retriable: true, RetryAfter: 0s, HTTPStatusCode: -1, RawError: storage.AccountsClient#ListKeys: Invalid input: autorest/validation: validation failed: parameter=accountName constraint=MaxLength value="our-storage-account-name" details: value length must be less than or equal to 24
if your service principal has access to the storage account, you could try this storage class:
https://github.com/kubernetes-sigs/blobfuse-csi-driver/blob/master/deploy/example/storageclass-blobfuse-csi-existing-container.yaml
and back to your original error, I think you did not set secret correctly, pls
- follow this guide to install
https://github.com/kubernetes-sigs/blobfuse-csi-driver/blob/master/docs/install-csi-driver-v0.4.0.md - follow this guide to use csi driver
https://github.com/kubernetes-sigs/blobfuse-csi-driver/blob/master/deploy/example/e2e_usage.md#dynamic-provisioning-create-storage-account-and-container-by-blobfuse-driver
Using the SP works, however I wasn't able to get the CSI driver working with a secret.
SP works in the same RG that Kubernetes is running in, but not in another resource group. Are there any constraints in terms of RG location? Are more permissions beyond "Storage Contributor" required for the SP?
Using the SP works, however I wasn't able to get the CSI driver working with a secret.
Could you use nodeStageSecretRef
in PV config, there is a breaking change in master branch, need to update the doc:
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-azurefile
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain #If set as "Delete" file share would be removed after pvc deletion
csi:
driver: file.csi.azure.com
readOnly: false
volumeHandle: arbitrary-volumeid
volumeAttributes:
shareName: EXISTING_FILE_SHARE_NAME #only file share name, don't use full path, e.g. https://storageaccount.file.core.windows.net/filesharename
nodeStageSecretRef:
name: azure-secret
namespace: default
If your SP has access to other RG, then it should also work.
"Storage Contributor" related permissions should be granted to your SP.
FYI. kubernetes-sigs/azurefile-csi-driver#161 fixed the PV config(with secret) doc issue.
I'm using blob not file storage. Is this still relevant?
I'm using blob not file storage. Is this still relevant?
yes, blobfuse should also use nodeStageSecretRef
, related doc PR: kubernetes-sigs/blob-csi-driver#108
Based on my experimenting, it seems that the Resource Group and Storage Account need to be in the same region as Kubernetes. It also needs to be in the same subscription (maybe not surprising, but it tripped me up because I didn't realize the storage account I was given was in a diff subscription)
I'm using nodeStageSecretRef
.
Thanks for your help!