FAILED - RETRYING: Wait for control plane pods to appear (v3.11/poor etcd listen host)

Question

FAILED - RETRYING: Wait for control plane pods to appear (v3.11/poor etcd listen host)

Closed this issue 4 years ago · 9 comments

Description

Behaviour very similar to issue #9575.

Here I'm deploying 3.11 to bare metal and the openshift_control_plane : Wait for control plane pods to appear task fails with the same error as #9575: -

 The connection to the server XYZ was refused - did you specify the right host or port?

Version

Ansible version

$ ansible --version
ansible 2.7.13
  config file = None
  configured module search path = [u'/home/centos/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /home/centos/.local/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, Aug  7 2019, 00:51:29) [GCC 4.8.5 20150623 (Red Hat 4.8.5-39)]

OpenShift Ansible tag: -

openshift-ansible-3.11.152-1

Steps To Reproduce

Run the deploy_cluster playbook

Expected Results

Control plane pods should appear.

Observed Results

The control-plane pods do not appear.

The output is essentially the same as issue #9575
I lost my output so that issue's error (which is essentially exactly the same) is repeated here...

TASK [openshift_control_plane : Wait for control plane pods to appear] ************************************************************************************************************************************************************
Tuesday 14 August 2018  16:39:24 +0800 (0:00:00.086)       0:22:42.301 ******** 
FAILED - RETRYING: Wait for control plane pods to appear (60 retries left).
FAILED - RETRYING: Wait for control plane pods to appear (59 retries left).
...............
FAILED - RETRYING: Wait for control plane pods to appear (1 retries left).
failed: [10.10.244.212] (item=__omit_place_holder__5e245b7f796113e2f9ba55e6c4a882ef0471a251) => {"attempts": 60, "changed": false, "item": "__omit_place_holder__5e245b7f796113e2f9ba55e6c4a882ef0471a251", "msg": {"cmd": "/bin/oc get pod master-__omit_place_holder__5e245b7f796113e2f9ba55e6c4a882ef0471a251-10.10.244.212 -o json -n kube-system", "results": [{}], "returncode": 1, "stderr": "The connection to the server 10.10.244.212:8443 was refused - did you specify the right host or port?\n", "stdout": ""}}

Additional Information

As discussed in the related issue, the etcd pod is listening using a specific IP address as can be seen by displaying the logs for that pod (i.e. it listens on 134.93.174.200:2379). But the API service is connection using 127.0.0.1:2379.

The work-around, which avoids the installation error is to over-ride the host using etcd_listen_client_urls.

In my YAML-based inventory I add this...

all:
  children:
    OSEv3:
      vars:
        etcd_listen_client_urls: 'https://0.0.0.0:2379'

And it works!

Is it time to ensure that thew etcd pod, out of the box, listens on 0.0.0.0:2379?

Answer 1 · 2019-10-08T15:05:29.000Z

I used the temporary solution posted danielkucera on 8th August in issue #6986

Answer 2 · 2019-10-18T00:44:33.000Z

I am facing same issue I will try the approach mentioned here.

Answer 3 · 2019-10-18T08:05:16.000Z

It didnt work T T

Answer 4 · 2020-06-03T04:22:29.000Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

Answer 5 · 2020-06-03T06:13:43.000Z

/remove-lifecycle stale

Answer 6 · 2020-10-16T12:41:09.000Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

Answer 7 · 2020-11-15T14:30:08.000Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

Answer 8 · 2020-12-15T16:25:53.000Z

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Answer 9 · 2020-12-15T16:26:13.000Z

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.