container-storage-interface/spec

Track Additional Comments for ModifyVolume in KEP-3751

sunnylovestiramisu opened this issue · 5 comments

Is your feature request related to a problem?/Why is this needed

There are some final discussions around the exact wording of the Spec in #544, this issue is created to track those comments and revisit before the changes go to Beta.

Describe the solution you'd like in detail

  1. Include wording in GetCapacity API to specify that SP needs to make sure capacity being compatible with all the valid mutable parameters - conversation link

Describe alternatives you've considered
N/A

Additional context
Issue in Kubernetes is in kubernetes/enhancements#3780

There is an other thing that I would lke to see considered for ControllerModifyVolumeResponse, as mentioned in #544 (comment)

From me:

It is probably imaginable that a change in parameters requires re-staging the volume to apply the parameters to a new connection (think of hostname change of an NFS-server). It may not be required immediately, but the change of the parameters only gets into effect once the volume has been disconnected and newly connected.

A FAILED_PRECONDITION is not sufficient in that case, I think. Maybe the ControllerModifyVolumeResponse could include an attribute to signal this?

And a more detailed example from @jdef:

interesting proposal. something like..

  // when unset, communicates to the CO that one or more mutable parameters
  // require a volume lifecycle state change before becoming effective.
  string awaiting_state_transition = N;

.. which could be even tighter w/ an enum for the blocking state transition

The NFS-server hostname change is one example that should be very clear. Without unmounting and remounting the NFS-export the modified parameter (hostname) can not be applied, and the volume will keep using the original value. Unmounting is not possible if a workload is actively using the volume, so a restart of the container(s) with NodeUnstage/NodeStage (and/or Publish) is required.

For Ceph-CSI, an other realistic problem is configuration of IOPS/throughput. This needs to be configured client-side while attaching the volume (during NodeStageVolume and/or NoodePublishVolume). As there is no communication to the NodePlugin to reconfigure the parameters, it can not be changed once the volume is attached.

As the requirement to re-attach the volume depends on the modified parameter(s), a hint in ControllerModifyVolumeResponse so that the CO can restart the workload if needed would be very helpful.

Yes above make sense for the RBD CSI driver, we support both krbd and nbd to mount the volumes, for volumes mounted with krbd needs to be reattached when the IOPS value changes as the IOPS is set on the device mounted on the node, and some indication need to be set to the csi plugin on the node to change the value. We handle the volumes attached with nbd differently by setting at the ceph.

It would be good to return an indication from the Controller that Node operation is required for volume as we do for the Expansion workflow.

@nixpanic NodeModifyVolume is not included in this KEP proposal. Resize API and this ModifyVolume API is two separate APIs. Are you suggesting us to consider some consecutive operations for Resizing and ControllerModifyVolume?

Hi @sunnylovestiramisu , yes, I understand those are different operations.

For some Storage Providers, some of the options that are modifiable, the currently staged/published volume will use options until it is staged/published anew. If an option requires newly staging/publishing, the controller should be able to inform the CO about that, so that appropriate actions can be taken.

The CO could log the requirement so that an admin can restart the workload at a convenient time, or the CO could even stop/start a workload by itself.

A hint in the ControllerModifyVolumeResponse message that the CSI-driver can set would be good. Depending on what hints would be useful, only one hint or a list of hints could be returned. Here an example to illustrate what I am thinking of:

message ControllerModifyVolumeResponse {
  option (alpha_message) = true;

  enum FollowUpAction {
    UNKNOWN = 0;

    // no action required on the node
    NO_ACTION = 1;

    // action required on the node, but not immediately
    REPORT_UNAPPLIED_OPTION = 2;

    // schedule a restage as early as possible
    NODE_RESTAGE_REQUIRED = 3;
  }

  FollowUpAction followUpAction = 0; 
}

Taking this a little further, writing this while not having considered it earlier... There may be Storage Providers that require a workload to be inactive while an option is modified. In that case ControllerModifyVolume should probably return a suitable error, so that the CO can stop the workload (unpublish/unstage the volume), re-try the ControllerModifyVolume operation, and start the workload afterwards again. No idea how feasible that is though.

This probably needs a discussion among the larger crowd. And is it possible to implement node operations later as a followup P2 and we make sure it is extensible? Basically P0/P1 in this KEP will enable the modification function to the providers that only needs the controller operation, P2 enable providers that needs both.

We need to collect use case from all providers that need or will need the node operation first to propose a solution.

Referring to the ControllerExpandVolume API, the signal it gives about if a node operation is needed is using a node_expansion_required field to signal if there is further operation needed on the node. But the proposal here is more complicated than a bool and more states involved.