Installer failed/timed out at master-0 node config step
Opened this issue · 1 comments
oaomer commented
I am trying to use this project to install OCP 4.10 on a tech zone/CECC kit. I followed the readme instructions and vars-powervm.yaml looks good to me, yet the installation continuously failed due to timeout during the TASK [nodes-config : Check connection] of master-0 node. The timeout is set to 2700s (=45min) in the vars yaml file and it waited the whole 45 minutes and then it failed.
TASK [ocp-config : Skip config if install workdir exist] **************************************************************************
ok: [129.40.126.241]
TASK [ocp-config : meta] **********************************************************************************************************
PLAY [Check and configure bootstrap node] *****************************************************************************************
TASK [nodes-config : Check connection] ********************************************************************************************
ok: [129.40.126.242]
TASK [nodes-config : Configure node] **********************************************************************************************
[WARNING]: Distribution redhat 4.10 on host 129.40.126.242 should use /usr/bin/python, but is using /usr/libexec/platform-python,
since the discovered platform python interpreter was not present. See https://docs.ansible.com/ansible-
core/2.12/reference_appendices/interpreter_discovery.html for more information.
changed: [129.40.126.242]
PLAY [Check and configure control-plane nodes] ************************************************************************************
TASK [nodes-config : Check connection] ********************************************************************************************
fatal: [129.40.126.243]: FAILED! => {"changed": false, "elapsed": 2715, "msg": "timed out waiting for ping module test: Failed to connect to the host via ssh: ssh: connect to host 129.40.126.243 port 22: Connection refused"}
NO MORE HOSTS LEFT ****************************************************************************************************************
PLAY RECAP ************************************************************************************************************************
129.40.126.241 : ok=141 changed=55 unreachable=0 failed=0 skipped=135 rescued=1 ignored=0
129.40.126.242 : ok=2 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
129.40.126.243 : ok=0 changed=0 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0
[root@p664-bastion ocp4-upi-powervm-hmc]#
No useful info in the logs despite elevating the log level to debug. Only seeing this block of log repeated displayed in /var/log/messages every 30 sec:
Oct 19 01:21:57 p664-bastion systemd[1]: helper-tftp.service: Succeeded.
Oct 19 01:22:27 p664-bastion systemd[1]: helper-tftp.service: Service RestartSec=30s expired, scheduling restart.
Oct 19 01:22:27 p664-bastion systemd[1]: helper-tftp.service: Scheduled restart job, restart counter is at 16735.
Oct 19 01:22:27 p664-bastion systemd[1]: Stopped Starts TFTP on boot because of reasons.
Oct 19 01:22:27 p664-bastion systemd[1]: Started Starts TFTP on boot because of reasons.
Oct 19 01:22:27 p664-bastion systemd[1]: helper-tftp.service: Succeeded.
marcopain commented
I am facing exact same issue. Any solution for this?