purestorage/helm-charts

pure-csi not found in the list of registered CSI drivers

czankel opened this issue · 12 comments

I'm getting the following error trying to mount a file system:

csi_attacher.go:330] kubernetes.io/csi: attacher.MountDevice failed to create newCsiDriverClient: driver name pure-csi not found in the list of registered CSI drivers

However, the driver shows up under csidrivers:

$ oc get csidrivers.storage.k8s.io NAME CREATED AT pure-csi 2019-08-22T17:00:13Z

Note that this is for FlashBlade, using the 'operator-csi-plugin' and some snapshot of Kubernetes v1.14.0.

You say k8s 1.14, but you are also using oc as the command. Can we assume you are using a version of OpenShift 4.x? What version would that be?

We are probably going to need a lot of log files to work out where the issue lies.
Can you post the describe from the pod that is failing the mount, as well as the logs from the pso-operator and pure-provisoner pods and the pure-csi pod from the node that is running the failing application pod. System log files from that node would also help.

The logs @sdodsley pointed out will be important.
Can you also check if this path is created on the node: /var/lib/kubelet/plugins/pure-csi/csi.sock.

Sorry for the delay, the system was offline over the weekend.

It's a daily snapshot of OpenShift 4.2 that is based on k8s 1.14 (AFAIK)
I'll see to get you the logs as requested.

For the path, I don't see the pure-csi path on the worker node, only /var/lib/kubelet/plugins/kubernetes.io/csi/pv/

One question I had is that I think the API changed in 1.14, and want to ensure I'm not running into this (1.14)

csi.storage.k8s.io/v1alpha1 CSINodeInfo and CSIDriver CRDs are no longer supported
New storage.k8s.io/v1beta1 CSINode and CSIDriver objects were introduced.

we are using storage.k8s.io/v1beta1:

apiVersion: storage.k8s.io/v1beta1

I found the issue. but don't really have a true fix but a work-around. This probably only affects openshift that is more pedantic with security: The service account default didn't have enough privileges.

  1. Add the 'privileged' security context to the default account:
    oc adm policy add-scc-to-user privileged -z default -n pure-csi-operator

  2. Force a restart of the pure-csi daemonset (otherwise it might take a while to take effect)
    oc delete daemonset pure-csi

Not sure if there's a better RBAC rule to use here, but this worked.

Side-note, the service account is defined by the pso-operator:
oc get pod pso-operator-XXXXXXXXX-YYYYY -o yaml|grep serviceAccount
serviceAccount: default
serviceAccountName: default

Thanks for letting us know. We were limiting the SCC to hostpath and that was sufficient for non CSI drivers pre 4.X. It seems like we may need privileged for CSI drivers in 4.X.

@czankel I submitted PR #116. Do you think you can give this a try?

Unfortunately, that didn't seem to have worked (I had some other issues, so cannot rule out that it could have worked). However, a couple of questions:

  • it uses the clusterrolebinding user, which is 'pure' by default, yet I have to use the 'default' user to get it to work, instead.
  • there doesn't seem to be a hostpath scc in OpenShift 4

Not sure when I can try it again.

Ok so it looks like I made a mistake.
The 'pure' ServiceAccount is the ServiceAccount used by our pods created here:

kind: ServiceAccount

The host path SCC is created by the install script for the 'pure' ServiceAccount to add the privileges required by our driver.

The pso-operator (that installs the pso driver) requires the privileged SCC. The pso-operator uses the default user/ServiceAccount and does not use the 'pure' ServiceAccount above. This is why you needed to add the privileged SCC to the default user.

serviceAccountName: default

So we can either change the install script to add privilege to the default user or to document it. My current feeling is that it should be documented such that its clear and we aren't adding extra privileges without admins being aware.

Updating documentation: #117

Closing as #117 has now merged.