OCP Cluster creation is failing in powervs using rhcos-413
mdafsanhossain opened this issue · 8 comments
PowerVS VMs go into shutoff state during cluster creation when rhcos-413.92.202306140611-0-powervs.ppc64le.ova.gz
is used as the image.
Terraform automation is stuck at 70% and throws remote-exec provisioner error
How to reproduce?
var.tfvars
ibmcloud_zone = "mon01"
service_instance_id = "8994821d-cdad-4df8-bf2b-d577a890ac24"
ibmcloud_api_key = ""
rhel_image_name = "rhel-86-05162022-tier1"
rhcos_import_image = true
rhcos_import_image_filename = "rhcos-413.92.202306140611-0-powervs.ppc64le.ova.gz"
rhcos_import_image_storage_type = "tier1"
system_type = "s922"
network_name = "ocp-private-network"
openshift_install_tarball = "https://mirror.openshift.com/pub/openshift-v4/ppc64le/clients/ocp/4.13.4/openshift-client-linux.tar.gz"
openshift_client_tarball = "https://mirror.openshift.com/pub/openshift-v4/ppc64le/clients/ocp/4.13.4/openshift-client-linux.tar.gz"
pull_secret_file = "/home/ubuntu/pull-secret.txt"
rhel_subscription_username = ""
rhel_subscription_password = ""
## Small Configuration Template
bastion = { memory = "16", processors = "0.5", "count" = 1 }
bootstrap = { memory = "32", processors = "0.5", "count" = 1 }
master = { memory = "32", processors = "0.5", "count" = 3 }
worker = { memory = "32", processors = "0.5", "count" = 2 }
storage_type = "nfs"
volume_size = "200"
cluster_id_prefix = "rhcos-test"
cluster_domain = "nip.io"
Created cluster using https://github.com/ocp-power-automation/openshift-install-power/blob/devel/openshift-install-powervs
./openshift-install-powervs create -var-file var.tfvars
cc: @yussufsh
Shutting down rhcos VMs is part of the install process. Can you show me what errors you are seeing in the console?
This is the terraform log. VMs in the shut off state are not coming back up.
[verify_data] Found id_rsa & id_rsa.pub in current directory
[powervs_login] Trying to login with the provided IBMCLOUD_API_KEY...
[powervs_login] Targeting 'oss-community-resources-montreal' with Id crn:v1:bluemix:public:power-iaas:mon01:a/108655d3ff9e4489b1c29e83df48623d:8994821d-cdad-4df8-bf2b-d577a890ac24::
[init_terraform] Initializing Terraform plugins and validating the code...
[apply] Running terraform apply... please wait
Attempt: 1/5
[retry_terraform] Encountered below errors: (70%)
│ Error: remote-exec provisioner error
[retry_terraform] WARN: Issues were seen while running the terraform command. Attempting to run again...
Attempt: 2/5
[retry_terraform] Encountered below errors: (70%)
│ Error: remote-exec provisioner error
[retry_terraform] WARN: Issues were seen while running the terraform command. Attempting to run again...
Attempt: 3/5
[retry_terraform] Encountered below errors: (70%)
│ Error: remote-exec provisioner error
[retry_terraform] WARN: Issues were seen while running the terraform command. Attempting to run again...
Attempt: 4/5
[retry_terraform] Encountered below errors: (70%)
│ Error: remote-exec provisioner error
[retry_terraform] WARN: Issues were seen while running the terraform command. Attempting to run again...
Attempt: 5/5
I don't have the log from the VMs themselves
As I said the VMs are supposed to be in shutoff state during this stage of install. Can you attach the last log file from logs
folder?
Here you go,
What is the status of bootatrap node? If it is shutdown please start and rerun the script. Else check the vm console and see what is the info.
What is the status of bootatrap node? If it is shutdown please start and rerun the script. Else check the vm console and see what is the info.
it was in Warning
state. Creating the cluster from scratch again
@yussufsh I noticed that the bootstrap node becomes unreachable after it comes back up and it gets stuck.
I tried installing 4.12 this time. It goes to 114% and gets stuck. Bootstrap node is accessible.
ocp4-upi-powervs_20230627075921_apply_1.log
ocp4-upi-powervs_20230627075921_apply_3.log