kubernetes-sigs/gcp-compute-persistent-disk-csi-driver

Add support for multi-zone volumeHandles

pwschuurman opened this issue · 3 comments

A pattern in Kubernetes workloads is to use a single ROX PV. When running a Deployment that runs in multiple zones, a user may want to leverage content backed on a PD. The user could copy the underlying data to PDs in each zone, however in the driver today, each PD would need to be represented by a different PV. The driver should have a way to support specifying that a volume handle transparently references a distinct ROX PD name that is available in multiple zones.

That's a great idea, still if there any discussion — other problem which may be connected to described by you: read-only PD can be mounted only to 10 instances

It will be nice to to try to connect to multiple PDs in the same zone as well

And if dog can dream: I'd love to see something like autocreation PDs from snapshot if there are no available volumes to read, but that is probably different issue

Anyway, thanks for PR

That's a great idea, still if there any discussion — other problem which may be connected to described by you: read-only PD can be mounted only to 10 instances

This is a good point, thanks for flagging. We don't have a good way to represent this in CSI (since volume limits are published per node). This would require some additional CSI specification changes to support, or a GCP specific scheduling plugin on the CO layer.

It will be nice to to try to connect to multiple PDs in the same zone as well

What's the use case here, if the PDs are in the same zone? I believe a possible workaround would be to attach multiple PDs (eg: disk-0, disk-1, disk-2) to the same node, and access each of them at different mount points (eg: this could just be solved at the application specification layer in the CO). Can you elaborate on what specific use case this would solve?

And if dog can dream: I'd love to see something like autocreation PDs from snapshot if there are no available volumes to read, but that is probably different issue

Just to confirm, would the idea here be to get around the 10-attachments-per-PD limit? So on the 11th attachment, a new disk would be dynamically provisioned?

Thanks for the quick reply.

What's the use case here

Let's say we have 20 VMs and two PDs (disk1 and disk2) in the same zone.

For the first ten instances, CSI will be able to mount disk1

Mount request number eleven will fail because of the «max 10 readers» limitation. And it would be nice if CSI tried the next disk, disk2

Also, I'm thinking about PD readers IO distribution, but I need to figure out how it works in Google Cloud.

Just to confirm, would the idea here be to get around the 10-attachments-per-PD limit? So on the 11th attachment, a new disk would be dynamically provisioned?

Yep. Let's say autoscaling adds 100 additional VMs — they may need more disks than are currently available. Will be useful if CSI can create a new disk from a snapshot by request.

Currently, I'm using Daemonset with an init container, which runs a bash script with many gcloud commands to achieve that.

Still, it's slow and may introduce additional bugs, and I don't like that whole idea with a specific custom daemonset.