openshift/openshift-ansible

FAILED - RETRYING: Wait for control plane pods to appear (v3.11/poor etcd listen host)

Closed this issue · 9 comments

Description

Behaviour very similar to issue #9575.

Here I'm deploying 3.11 to bare metal and the openshift_control_plane : Wait for control plane pods to appear task fails with the same error as #9575: -

 The connection to the server XYZ was refused - did you specify the right host or port?
Version

Ansible version

$ ansible --version
ansible 2.7.13
  config file = None
  configured module search path = [u'/home/centos/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /home/centos/.local/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, Aug  7 2019, 00:51:29) [GCC 4.8.5 20150623 (Red Hat 4.8.5-39)]

OpenShift Ansible tag: -

openshift-ansible-3.11.152-1
Steps To Reproduce
  1. Run the deploy_cluster playbook
Expected Results

Control plane pods should appear.

Observed Results

The control-plane pods do not appear.

The output is essentially the same as issue #9575
I lost my output so that issue's error (which is essentially exactly the same) is repeated here...

TASK [openshift_control_plane : Wait for control plane pods to appear] ************************************************************************************************************************************************************
Tuesday 14 August 2018  16:39:24 +0800 (0:00:00.086)       0:22:42.301 ******** 
FAILED - RETRYING: Wait for control plane pods to appear (60 retries left).
FAILED - RETRYING: Wait for control plane pods to appear (59 retries left).
...............
FAILED - RETRYING: Wait for control plane pods to appear (1 retries left).
failed: [10.10.244.212] (item=__omit_place_holder__5e245b7f796113e2f9ba55e6c4a882ef0471a251) => {"attempts": 60, "changed": false, "item": "__omit_place_holder__5e245b7f796113e2f9ba55e6c4a882ef0471a251", "msg": {"cmd": "/bin/oc get pod master-__omit_place_holder__5e245b7f796113e2f9ba55e6c4a882ef0471a251-10.10.244.212 -o json -n kube-system", "results": [{}], "returncode": 1, "stderr": "The connection to the server 10.10.244.212:8443 was refused - did you specify the right host or port?\n", "stdout": ""}}
Additional Information

As discussed in the related issue, the etcd pod is listening using a specific IP address as can be seen by displaying the logs for that pod (i.e. it listens on 134.93.174.200:2379). But the API service is connection using 127.0.0.1:2379.

The work-around, which avoids the installation error is to over-ride the host using etcd_listen_client_urls.

In my YAML-based inventory I add this...

all:
  children:
    OSEv3:
      vars:
        etcd_listen_client_urls: 'https://0.0.0.0:2379'

And it works!

Is it time to ensure that thew etcd pod, out of the box, listens on 0.0.0.0:2379?

I used the temporary solution posted danielkucera on 8th August in issue #6986

I am facing same issue I will try the approach mentioned here.

It didnt work T T

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

/remove-lifecycle stale

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.