digitalocean/csi-digitalocean

Cluster scaling up due to max volume count

dbpolito opened this issue · 3 comments

My cluster scaled up due to Volume Mount Limit:

  Warning  FailedScheduling        <unknown>  default-scheduler         0/2 nodes are available: 2 node(s) exceed max volume count.
  Warning  FailedScheduling        <unknown>  default-scheduler         0/2 nodes are available: 2 node(s) exceed max volume count.
  Warning  FailedScheduling        <unknown>  default-scheduler         0/3 nodes are available: 1 node(s) had taints that the pod didn't tolerate, 2 node(s) exceed max volume count.
  Normal   Scheduled               <unknown>  default-scheduler         Successfully assigned <container> to <pool-instance>

I took a look at https://www.digitalocean.com/docs/volumes/#limits and this is not clear to me.

Everything worked, volumes were created and my deploy worked fine but i didn't expect my cluster to scale up due to number of volumes mounted.

Is there something i can do or configure to avoid that?

@dbpolito to make sure I understand your issue correctly: are you saying that you found your cluster scaling up due to supposed volumes-per-node limitation but you think it shouldn't have done so; or do you think the upscale was justified to allow scheduling but you'd rather see the cluster not autoscale (even if it means a to-be-scheduled pod stays stuck)?

I'll try to answer generally for now: we currently do not expose any knobs around tweaking the autoscaler (and to be honest, I don't know off the top of my head if settings exist for volume-related decision making). Each node can mount a maximum of 7 volumes; if you find your workload not being distributed evenly to take full advantage of your total volume capacity, you may want to set node / pod (anti-)affinity rules to steer your pods onto the right nodes.

Happy to learn more about your specific use case so that we can discuss options.

Each node can mount a maximum of 7 volumes

So this is a limitation on DO for attaching volumes in Droplets:

image

I was trying to understand the limit, while researching i found the volume list was defined by the CSI Driver, so i was wondering if it was somehow configurable.

This is indeed a limitation that i will need to change in my use case as i expect to have many volumes...

This volume limit plus the ReadWriteMany limitation i might go with something like https://rook.io/

Thank you.

@dbpolito indeed, this is a limitation enforced by the DO storage backend. Rook.io sounds like an interesting alternative if you cannot or do not want to work around the limitation through scheduling.