OKD 3.11 installation failure
gireeshpunathil opened this issue · 6 comments
Description
I tried installing on an RHEL group of machines. Towards the end of installation, when the openshift_control_plane
target is being executed, I get this error from which it is never able to recover. Eventually the installation terminates with error.
Version
OKD 3.11
Please put the following version information in the code block
indicated below.
- Your ansible version per
ansible --version
ansible 2.7.10
config file = /etc/ansible/ansible.cfg
configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python2.7/site-packages/ansible
executable location = /bin/ansible
python version = 2.7.5 (default, Jun 11 2019, 12:19:05) [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)]
If you're operating from a git clone:
- The output of
git describe
If you're running from playbooks installed via RPM
- The output of
rpm -q openshift-ansible
Place the output between the code block below:
VERSION INFORMATION HERE PLEASE
Steps To Reproduce
-
[step 1]
-
[step 2]
-
Download the the okd installer
-
run the pre-reuiqsites, and run the deployment descriptor ansible.
Expected Results
Describe what you expected to happen.
OKD up and running
Example command and output or error messages
TASK [openshift_control_plane : Wait for control plane pods to appear] **********************************
FAILED - RETRYING: Wait for control plane pods to appear (60 retries left).
FAILED - RETRYING: Wait for control plane pods to appear (59 retries left).
FAILED - RETRYING: Wait for control plane pods to appear (58 retries left).
FAILED - RETRYING: Wait for control plane pods to appear (57 retries left).
FAILED - RETRYING: Wait for control plane pods to appear (56 retries left).
Observed Results
Describe what is actually happening.
Example command and output or error messages
For long output or logs, consider using a gist
Additional Information
Provide any additional information which may help us diagnose the
issue.
- Your operating system and version, ie: RHEL 7.2, Fedora 23 (
$ cat /etc/redhat-release
) - Your inventory file (especially any non-standard configuration parameters)
- Sample code, etc
EXTRA INFORMATION GOES HERE
Thanks in advance. I can easily reproduce, so any debugging tips, much appreciated!
additional info: when I executed individual tasks in deploy_cluster.yml
, I see warnings from control_plane.yml
as:
/etc/ansible/hosts did not meet host_list requirements, check plugin documentation if this is unexpected
/etc/ansible/hosts did not meet script requirements, check plugin documentation if this is unexpected
Can you put you inventory file to check config. I have a similiar problem when in my inventory file was different names. By the way you don't write what type of Opesnhift you deploy OKD or Enterprise.
OKD working the best on Centos OS.
I am working on OKD.
here is the inventory
# cat /etc/ansible/hosts
[masters]
M1.fyre.ibm.com openshift_ip=9.X.X.45 openshift_public_hostname="M1.fyre.ibm.com"
[etcd]
M1.fyre.ibm.com openshift_ip=9.X.X.45 openshift_public_hostname="M1.fyre.ibm.com"
[nodes]
M1.fyre.ibm.com openshift_node_group_name='node-config-master-infra' openshift_ip=9.X.X.45 openshift_public_hostname="M1.fyre.ibm.com"
M2.fyre.ibm.com openshift_node_group_name='node-config-compute' openshift_ip=9.X.X.224 openshift_public_hostname="M2.fyre.ibm.com"
M3.fyre.ibm.com openshift_node_group_name='node-config-compute' openshift_ip=9.X.X.171 openshift_public_hostname="M3.fyre.ibm.com"
M4.fyre.ibm.com openshift_node_group_name='node-config-compute' openshift_ip=9.X.X.124 openshift_public_hostname="M4.fyre.ibm.com"
# Create an OSEv3 group that contains the masters and nodes groups
[OSEv3:children]
masters
nodes
etcd
#glusterfs
[masters]
M1.fyre.ibm.com
# host group for etcd
[etcd]
M1.fyre.ibm.com
[OSEv3:vars]
ansible_user=root
openshift_deployment_type=origin
openshift_release=3.11
ansible_ssh_common_args='-o StrictHostKeyChecking=no'
openshift_master_default_subdomain=9.X.X.45.nip.io
openshift_master_cluster_hostname=M1.fyre.ibm.com
openshift_disable_check=docker_storage
debug_level=2
ansible_service_broker_install=false
#openshift_storage_glusterfs_namespace=app-storage
#openshift_storage_glusterfs_storageclass=true
#openshift_storage_glusterfs_storageclass_default=true
#openshift_storage_glusterfs_block_deploy=true
#openshift_storage_glusterfs_block_host_vol_size=100
#openshift_storage_glusterfs_block_storageclass=true
#openshift_storage_glusterfs_block_storageclass_default=false
#openshift_schedulable=true
openshift_docker_selinux_enabled=False
openshift_docker_options="--signature-verification=false --insecure-registry=10.30.0.0/16 --log-opt max-size=1M --log-opt max-file=3 --disable-legacy-registry=true"
openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider'}]
openshift_master_htpasswd_users={'my-os-admin': 'XXXX'} #encrypted password of: S3cure-icp-wordP*s?
openshift_master_cluster_public_hostname=M1.fyre.ibm.com
openshift_master_api_port=7443
openshift_master_console_port=7443
openshift_hostname_check=false
openshift_disable_check=memory_availability
#[glusterfs]
#M2.fyre.ibm.com glusterfs_devices='[ "/dev/vdb" ]'
#M3.fyre.ibm.com glusterfs_devices='[ "/dev/vdb" ]'
#M4.fyre.ibm.com glusterfs_devices='[ "/dev/vdb" ]'
#
Have you resolved the issue?
Did master api started? Could you please paste output from 'docker ps'?
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
the issue and its context no more available, closing