kubernetes-sigs/gcp-compute-persistent-disk-csi-driver

Driver can display the secret content of node-expand-secret-name

mpatlasov opened this issue · 10 comments

What happened?

An user with permissions to create or modify StorageClass may print any Secret.

The problem comes from a klog print which is not sanitized:

	// Note that secrets are not included in any RPC message. In the past protosanitizer and other log
	// stripping was shown to cause a significant increase of CPU usage (see
	// https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/issues/356#issuecomment-550529004).
	klog.V(4).Infof("%s called with request: %s", info.FullMethod, req)

The comment says that secrets are not included in any RPC message, however there many RPC messages that might include secrets. For example NodeExpandVolume as explained here. If one can add:

parameters:
  csi.storage.k8s.io/node-expand-secret-name: test-secret

to the StorageClass, then after PV resize one can see in logs:

I0915 03:09:16.116829       1 utils.go:66] /csi.v1.Node/NodeExpandVolume called with request: volume_id:"projects/openshift-gce-devel/zones/us-central1-a/disks/pvc-11fd4016-7702-4c0c-9fae-71eddea3e4fa" volume_path:"/var/lib/kubelet/pods/2df06db8-b19a-4b7d-8aab-4d6eb65f0df0/volumes/kubernetes.io~csi/pvc-11fd4016-7702-4c0c-9fae-71eddea3e4fa/mount" capacity_range:<required_bytes:3221225472 > staging_target_path:"/var/lib/kubelet/plugins/kubernetes.io/csi/pd.csi.storage.gke.io/0b0a5be8b1e2d20b1e05a9f1f6d7b1ea36fcf6acfade1a9e448a088e20fe38bf/globalmount" volume_capability:<mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > > secrets:<key:"password" value:"t0p-Secret0" > secrets:<key:"username" value:"admin" > 

Note that there are 7 more RPC message types with secrets listed here.

GCP CSI driver does not seem to use these features, so the threat may come only from malicious user with elevated permissions who intentinally want to expose secrets.

What did you expect to happen?

Log print to be sanitized. E.g.:

I0915 03:29:10.005165       1 utils.go:81] /csi.v1.Node/NodeExpandVolume called with request: {"capacity_range":{"required_bytes":4294967296},"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pd.csi.storage.gke.io/0b0a5be8b1e2d20b1e05a9f1f6d7b1ea36fcf6acfade1a9e448a088e20fe38bf/globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_id":"projects/openshift-gce-devel/zones/us-central1-a/disks/pvc-11fd4016-7702-4c0c-9fae-71eddea3e4fa","volume_path":"/var/lib/kubelet/pods/2df06db8-b19a-4b7d-8aab-4d6eb65f0df0/volumes/kubernetes.io~csi/pvc-11fd4016-7702-4c0c-9fae-71eddea3e4fa/mount"}

How can we reproduce it?

  1. Create secret
apiVersion: v1
kind: Secret
metadata:
  name: test-secret
  namespace: default
data:
stringData:
  username: admin
  password: t0p-Secret
  1. Create sc
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: test5
parameters:
  csi.storage.k8s.io/node-expand-secret-name: test-secret
  csi.storage.k8s.io/node-expand-secret-namespace: default
provisioner: pd.csi.storage.gke.io
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
  1. Create pvc/pod

  2. Update csi driver log to Debug

  3. Resize the volume capacity

  4. Check csi driver logs

I0915 03:09:16.116829       1 utils.go:66] /csi.v1.Node/NodeExpandVolume called with request: volume_id:"projects/openshift-gce-devel/zones/us-central1-a/disks/pvc-11fd4016-7702-4c0c-9fae-71eddea3e4fa" volume_path:"/var/lib/kubelet/pods/2df06db8-b19a-4b7d-8aab-4d6eb65f0df0/volumes/kubernetes.io~csi/pvc-11fd4016-7702-4c0c-9fae-71eddea3e4fa/mount" capacity_range:<required_bytes:3221225472 > staging_target_path:"/var/lib/kubelet/plugins/kubernetes.io/csi/pd.csi.storage.gke.io/0b0a5be8b1e2d20b1e05a9f1f6d7b1ea36fcf6acfade1a9e448a088e20fe38bf/globalmount" volume_capability:<mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > > secrets:<key:"password" value:"t0p-Secret0" > secrets:<key:"username" value:"admin" > 

Anything else we need to know?

n/a

/cc @msau42 , @mattcary , @cjcullen, @tallclair, @jsafrane

/assign

I have a patch to sanitize specific RPC req types, but before opening PR I'd like to get feedback on issue severity. The severity is probably depends on possibility to have permissions high enough to create or modify StorageClass, but not as high as cluster admin who can print secrets in a simpler way.

We discussed it on today's sig-storage triage meeting and we agreed that it should be fixed by sanitizing requests, but only when debug log level is enabled:

  if klog.V(4).Enabled() {
    klog.Infof("%s called with request: %s", info.FullMethod, sanitize(req))
  }

Then it won't get called in production or wherever performance is required.

msau42 commented

For production we actually use log level 4. The pdcsi driver does not actually require any secrets though? Maybe we need to look at ways to disallow/ignore secrets being passed in the first place.

Yes, the CSI driver does not need any secrets, it just logs them and then ignores them. Currently it's not possible to tell CSI sidecars / kubelet what secrets the driver actually uses and I am not sure it's the right direction - it could be error prone to adding new method / secret.

If there was a CSIDriver field that said whether the driver used secrets at all, then tere could be eg an admission webhook that only let known drivers exposing secrets to be installed. (eg, this driver doesn't use secrets at all).

I understand there'd be backwards compatibility issues with this, but it's an idea.