OKD 3.11 installation failure

Question

OKD 3.11 installation failure

gireeshpunathil opened this issue 5 years ago · 6 comments

Description

I tried installing on an RHEL group of machines. Towards the end of installation, when the openshift_control_plane target is being executed, I get this error from which it is never able to recover. Eventually the installation terminates with error.

Version

OKD 3.11

Please put the following version information in the code block
indicated below.

Your ansible version per ansible --version

ansible 2.7.10
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /bin/ansible
  python version = 2.7.5 (default, Jun 11 2019, 12:19:05) [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)]

If you're operating from a git clone:

The output of git describe

If you're running from playbooks installed via RPM

The output of rpm -q openshift-ansible

Place the output between the code block below:

VERSION INFORMATION HERE PLEASE

Steps To Reproduce

[step 1]
[step 2]
Download the the okd installer
run the pre-reuiqsites, and run the deployment descriptor ansible.

Expected Results

Describe what you expected to happen.

OKD up and running

Example command and output or error messages

TASK [openshift_control_plane : Wait for control plane pods to appear] **********************************
FAILED - RETRYING: Wait for control plane pods to appear (60 retries left).
FAILED - RETRYING: Wait for control plane pods to appear (59 retries left).
FAILED - RETRYING: Wait for control plane pods to appear (58 retries left).
FAILED - RETRYING: Wait for control plane pods to appear (57 retries left).
FAILED - RETRYING: Wait for control plane pods to appear (56 retries left).

Observed Results

Describe what is actually happening.

Example command and output or error messages

For long output or logs, consider using a gist

Additional Information

Provide any additional information which may help us diagnose the
issue.

Your operating system and version, ie: RHEL 7.2, Fedora 23 ($ cat /etc/redhat-release)
Your inventory file (especially any non-standard configuration parameters)
Sample code, etc

EXTRA INFORMATION GOES HERE

Thanks in advance. I can easily reproduce, so any debugging tips, much appreciated!

Answer 1 · 2019-08-05T05:45:24.000Z

additional info: when I executed individual tasks in deploy_cluster.yml, I see warnings from control_plane.yml as:

/etc/ansible/hosts did not meet host_list requirements, check plugin documentation if this is unexpected
/etc/ansible/hosts did not meet script requirements, check plugin documentation if this is unexpected

Answer 2 · 2019-08-05T19:36:51.000Z

Can you put you inventory file to check config. I have a similiar problem when in my inventory file was different names. By the way you don't write what type of Opesnhift you deploy OKD or Enterprise.
OKD working the best on Centos OS.

Answer 3 · 2019-08-06T04:51:53.000Z

I am working on OKD.

here is the inventory

# cat /etc/ansible/hosts
[masters]
M1.fyre.ibm.com openshift_ip=9.X.X.45 openshift_public_hostname="M1.fyre.ibm.com"
[etcd]
M1.fyre.ibm.com openshift_ip=9.X.X.45 openshift_public_hostname="M1.fyre.ibm.com"
[nodes]
M1.fyre.ibm.com openshift_node_group_name='node-config-master-infra' openshift_ip=9.X.X.45 openshift_public_hostname="M1.fyre.ibm.com"
M2.fyre.ibm.com openshift_node_group_name='node-config-compute' openshift_ip=9.X.X.224 openshift_public_hostname="M2.fyre.ibm.com"
M3.fyre.ibm.com openshift_node_group_name='node-config-compute' openshift_ip=9.X.X.171 openshift_public_hostname="M3.fyre.ibm.com"
M4.fyre.ibm.com openshift_node_group_name='node-config-compute' openshift_ip=9.X.X.124 openshift_public_hostname="M4.fyre.ibm.com"


# Create an OSEv3 group that contains the masters and nodes groups
[OSEv3:children]
masters
nodes
etcd
#glusterfs

[masters]
M1.fyre.ibm.com

# host group for etcd
[etcd]
M1.fyre.ibm.com

[OSEv3:vars]
ansible_user=root
openshift_deployment_type=origin
openshift_release=3.11
ansible_ssh_common_args='-o StrictHostKeyChecking=no'
openshift_master_default_subdomain=9.X.X.45.nip.io
openshift_master_cluster_hostname=M1.fyre.ibm.com
openshift_disable_check=docker_storage
debug_level=2
ansible_service_broker_install=false
#openshift_storage_glusterfs_namespace=app-storage
#openshift_storage_glusterfs_storageclass=true
#openshift_storage_glusterfs_storageclass_default=true
#openshift_storage_glusterfs_block_deploy=true
#openshift_storage_glusterfs_block_host_vol_size=100
#openshift_storage_glusterfs_block_storageclass=true
#openshift_storage_glusterfs_block_storageclass_default=false
#openshift_schedulable=true

openshift_docker_selinux_enabled=False
openshift_docker_options="--signature-verification=false --insecure-registry=10.30.0.0/16 --log-opt max-size=1M --log-opt max-file=3 --disable-legacy-registry=true"

openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider'}]
openshift_master_htpasswd_users={'my-os-admin': 'XXXX'} #encrypted password of: S3cure-icp-wordP*s?


openshift_master_cluster_public_hostname=M1.fyre.ibm.com

openshift_master_api_port=7443
openshift_master_console_port=7443

openshift_hostname_check=false
openshift_disable_check=memory_availability

#[glusterfs]
#M2.fyre.ibm.com glusterfs_devices='[ "/dev/vdb" ]'
#M3.fyre.ibm.com glusterfs_devices='[ "/dev/vdb" ]'
#M4.fyre.ibm.com glusterfs_devices='[ "/dev/vdb" ]'
#

Answer 4 · 2019-09-25T06:44:10.000Z

Have you resolved the issue?
Did master api started? Could you please paste output from 'docker ps'?

Answer 5 · 2020-06-02T22:21:21.000Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

Answer 6 · 2020-06-03T02:28:39.000Z

the issue and its context no more available, closing