CSIStorageCapacity: Topology segment not updated
samuelluohaoen1 opened this issue · 20 comments
What happened:
After new node plugins join the cluster and report new AccessibleTopologies.Segments, the current segment information is not getting updated. New CSIStorageCapacity objects are not being created.
What you expected to happen:
New node plugins reporting new values for existing topology segments should in a sense "expand" the value sets of existing topology segments. Which in turn should result in CSIStorageCapacity objects being created for new accessible segments.
How to reproduce it:
- Suppose the CSIDriver has name
com.foo.bar
. Check thatSTORAGECAPACITY
is true. - Deploy controller plugin but not node plugin. Wait for external-provisioner to print "Initial number of topology segments 0, storage classes 0, potential CSIStorageCapacity objects 0" (To see this log run external-provisioner with log level 5).
- Now CSINode should have
DRIVERS: 0
. - Deploy the node plugin. Wait for the
NodeGetInfo
RPC to be called. The RPC should return something like
{
"NodeId": "some-node",
"AccessibleTopologies": {
"Segments": [
"kubernetes.io/hostname": "some-node"
]
}
}
- Now CSINode should have
DRIVERS: 1
which is namedcom.foo.bar
withNode ID: some-node
andTopology Keys: [kubernetes.io/hostname]
. - Deploy a StorageClass with
volumeBindingMode: WaitForFirstConsumer
andprovisioner: com.foo.bar
. - No new CSIStorageCapacity object is created.
Anything else we need to know?:
I am using the "kubernetes.io/hostname" label as the only key because we want topology to be constraint by each node. Each PV is to be provisioned locally on some node. I also assumed that "kubernetes.io/hostname" is unique across the nodes and should by default exist on every node (I hope this is a reasonable assumption).
Environment:
- Driver version: v3.0.0
- Kubernetes version (use
kubectl version
): 1.25+ - OS (e.g. from /etc/os-release): Our in-house OS which is very similar to CentOS
- Kernel (e.g.
uname -a
): Linux 4.18.0 - Install tools: kubeadm
- Others:
No new CSIStorageCapacity object is created.
How do you check for this? With kubectl get csistoragecapacities
or kubectl get --all-namespaces csistoragecapacities
?
CSIStorageCapacity objects are namespaced, so the second command has to be used.
I tried to reproduce the issue with csi-driver-host-path v1.10.0, but there I get new CSIStorageCapacity objects after creating a storage class.
My commands:
/deploy/kubernetes-distributed/deploy.sh
kubectl delete storageclass.storage.k8s.io/csi-hostpath-slow
kubectl delete storageclass.storage.k8s.io/csi-hostpath-fast
kubectl get --all-namespaces csistoragecapacity
kubectl create -f deploy/kubernetes-distributed/hostpath/csi-hostpath-storageclass-fast.yaml
kubectl get --all-namespaces csistoragecapacity
csi-provisioner:v3.3.0
No new CSIStorageCapacity object is created.
How do you check for this? With
kubectl get csistoragecapacities
orkubectl get --all-namespaces csistoragecapacities
?CSIStorageCapacity objects are namespaced, so the second command has to be used.
I tried to reproduce the issue with csi-driver-host-path v1.10.0, but there I get new CSIStorageCapacity objects after creating a storage class.
Yes it is indeed namespaced. My kubectl has the default namespace set to the namespace where the CSI plugins are deployed.
My commands:
/deploy/kubernetes-distributed/deploy.sh kubectl delete storageclass.storage.k8s.io/csi-hostpath-slow kubectl delete storageclass.storage.k8s.io/csi-hostpath-fast kubectl get --all-namespaces csistoragecapacity kubectl create -f deploy/kubernetes-distributed/hostpath/csi-hostpath-storageclass-fast.yaml kubectl get --all-namespaces csistoragecapacity
From the sequence of your commands I do not see how the controller plugin is deployment before the node plugins. I think the order of the deployment may be crucial to reproducing this issue. Could you make sure that step 2 happens before node plugins are deployed? Thank you for your trouble.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/reopen
/assign
@pohly: Reopened this issue.
In response to this:
/reopen
/assign
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
@samuelluohaoen1: it looks like you are using a central controller for your CSI driver. Is that correct?
Can you perhaps share the external-provisioner log at level >= 5? The is code which should react to changes in the node and CSIDriver objects when the node plugin gets registered after the controller has started.
We don't have a CSI driver deployment readily available to test this scenario. I tried reproducing it through unit tests (see #942) but the code worked as expected.
/remove-lifecycle rotten