Not able to "Commission" or "Discover" a node which is in "Disappeared" state.

Question

Not able to "Commission" or "Discover" a node which is in "Disappeared" state.

pradvara opened this issue 8 years ago · 1 comments

Steps to reproduce:

Commissioned new "un-allocated"(discovered) node on the Controller.
During the commissioning the connectivity of the servers went down.
I was not able to ping the worker node, manually rebooted the node to bring up the server.
The worker nodes status was "Disappeared"
Was not able to rediscover/commission the server.

ON CONTROL NODE:
contiv-r3-FCH1726V18M: Inventory State
contiv-r3-FCH1726V18M: name: contiv-r3-FCH1726V18M
contiv-r3-FCH1726V18M: prev_state: Disappeared
contiv-r3-FCH1726V18M: prev_status: Provisioning
contiv-r3-FCH1726V18M: state: Disappeared
contiv-r3-FCH1726V18M: status: Unallocated
contiv-r3-FCH1726V18M: Monitoring State
contiv-r3-FCH1726V18M: label: contiv-r3
contiv-r3-FCH1726V18M: management_address: 10.106.240.115
contiv-r3-FCH1726V18M: serial_number: FCH1726V18M
contiv-r3-FCH1726V18M: Configuration State
contiv-r3-FCH1726V18M: host_group: service-worker
contiv-r3-FCH1726V18M: inventory_name: contiv-r3-FCH1726V18M
contiv-r3-FCH1726V18M: inventory_vars:
contiv-r3-FCH1726V18M: etcd_master_addr: 10.106.240.111
contiv-r3-FCH1726V18M: etcd_master_name: contiv-b2-FCH1701J2KV
contiv-r3-FCH1726V18M: node_addr: 10.106.240.115
contiv-r3-FCH1726V18M: node_name: contiv-r3-FCH1726V18M
contiv-r3-FCH1726V18M: ssh_address: 10.106.240.115

[stack@contiv-b1 ~]$ clusterctl node commission contiv-r3-FCH1726V18M --extra-vars='{"env" : {"http_proxy": "http://proxy-wsa.esl.cisco.com:80","https_proxy": om:80"}, "control_interface": "enp6s0", "netplugin_if": "enp7s0", "service_vip": "10.106.240.121"}' --host-group=service-worker
2016/06/23 11:51:34 Request URL: commission/node/contiv-r3-FCH1726V18M Request Body: &{Nodes:[] Addrs:[] HostGroup:service-worker ExtraVars:{"env" : {"http_prosco.com:80","https_proxy": "http://proxy-wsa.esl.cisco.com:80"}, "control_interface": "enp6s0", "netplugin_if": "enp7s0", "service_vip": "10.106.240.121"} Job:onse status: "500 Internal Server Error". Response body: one or more nodes are not in discovered state, please check their network reachability. Non-discoveredM]

[stack@contiv-b1 ~]$ clusterctl discover 10.106.240.115 --extra-vars='{"env" : {"http_proxy": "http://proxy-wsa.esl.cisco.com:80","https_proxy": "http://proxy-rol_interface": "enp6s0"}'
2016/06/23 11:52:09 Request URL: discover/nodes Request Body: &{Nodes:[] Addrs:[10.106.240.115] HostGroup: ExtraVars:{"env" : {"http_proxy": "http://proxy-wsa.y": "http://proxy-wsa.esl.cisco.com:80"}, "control_interface": "enp6s0"} Job: Event:{Name: Nodes:[]}} Response status: "500 Internal Server Error". Response bo exist with the specified management addresses. Existing nodes: [contiv-r3-FCH1726V18M:10.106.240.115]

Answer 1 · 2016-07-11T23:14:00.000Z

unfortunately, this is expected behavior as when a node is in disappeared state it implies that is it not reachable over network and hence cluster-mgr won't be able to reach it remotely :(

In this case a node needs to be manually reached (either through a console or physically) and it network connectivity recovered.

I will close this issue but please feel free to reopen or submit another issue if I missed something