kubernetes/cloud-provider-openstack

can't update label after live migration.

pyama86 opened this issue · 9 comments

Is this a BUG REPORT:

/kind bug

What happened:
I'm not sure, I guess that topology label isn't updated after livemigration instance.

Because, I move instance from ourzone-1(AZ) to ourzone-2(AZ) and I happend bellow logs.

% kubectl logs csi-cinder-nodeplugin-5vwrk node-driver-registrar
I0228 05:44:31.902073       1 main.go:113] Version: v2.1.0
I0228 05:44:31.903166       1 connection.go:153] Connecting to unix:///csi/csi.sock
I0228 05:44:31.907211       1 node_register.go:52] Starting Registration Server at: /registration/cinder.csi.openstack.org-reg.sock
I0228 05:44:31.908329       1 node_register.go:61] Registration Server started at: /registration/cinder.csi.openstack.org-reg.sock
I0228 05:44:31.908528       1 node_register.go:83] Skipping healthz server because HTTP endpoint is set to: ""
I0228 05:44:33.438290       1 main.go:80] Received GetInfo call: &InfoRequest{}
I0228 05:44:33.854681       1 main.go:90] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:false,Error:RegisterPlugin error -- plugin registration failed with err: error updating Node object with CSI driver node info: error updating node: timed out waiting for the condition; caused by: detected topology value collision: driver reported "topology.cinder.csi.openstack.org/zone":"ourzone-2" but existing label is "topology.cinder.csi.openstack.org/zone":"ourzone-1",}
E0228 05:44:33.854737       1 main.go:92] Registration process failed with error: RegisterPlugin error -- plugin registration failed with err: error updating Node object with CSI driver node info: error updating node: timed out waiting for the condition; caused by: detected topology value collision: driver reported "topology.cinder.csi.openstack.org/zone":"ourzone-2" but existing label is "topology.cinder.csi.openstack.org/zone":"ourzone-1", restarting registration container.

I solved it by removing the label.

kubectl label node xxx topology.cinder.csi.openstack.org/zone-"
kubectl label node xxx topology.kubernetes.io/zone-"

Is this the expected behavior?

What you expected to happen:
I expected the label to be updated automatically after live migration.

How to reproduce it:
Doing live migration to other AZ.

Anything else we need to know?:

Environment:

  • openstack-cloud-controller-manager(or other related binary) version:1.22
  • OpenStack version:Newton
  • Others:
gman0 commented

The labels will not be overriden if they are already set -- that's why you see kubelet complaining. So something must remove them. I'm not sure whose responsibility would this be though. Admin's doing the migration? OCCM's?

Admin's doing the migration? OCCM's?

Yes, Again, my question is exists label can't update after live migration and csi cinder provider is'nt considering that?
Or is it the job of kubelet to update the label?

seems live migrate from one zone to another is not recommended
I am not sure OCCM/CSI openstack need support it or not, but worth discussion here

https://docs.openstack.org/nova/latest/admin/availability-zones.html

Knowing this, it is dangerous to force a server to another host with evacuate or live migrate if the server is restricted to a zone and is then forced to move to a host in another zone, because that will create an inconsistency in the internal tracking of where that server should live and may require manually updating the database for that server. For example, if a user creates a server in zone A and then the admin force live migrates the server to zone B, and then the user resizes the server, the scheduler will try to move it back to zone A which may or may not work, e.g. if the admin deleted or renamed zone A in the interim.
gman0 commented

My last comment was intended as a question to others in this repo, not to you @pyama86 specifically -- so just thinking out loud :) However I'm not sure this should be cinder-csi's job, because (1) a CSI driver doesn't know about any node labels or otherwise, (2) there are other, non cinder-csi labels that would need updating in addition to topology.cinder.csi.openstack.org.

@lingxiankong @ramineni do you think OCCM could help in updating node labels after live migration across AZs?

gman0 commented

@jichenjc excellent catches, thanks!

If OCCM has a policy of updating labels, I'd like to write a patch, but how about it?

it's cloud provider has such support (not sure whether they consider the live migration case )
but I think technically we should update it after live migration (if cloud allow so), thus agree we can consider a PR

Based on the your information I received, I understood the implementation and it seemed that it should be processed by kubelet, so I create an issue in kubernetes.
thanks @jichenjc @gman0