terraform-ibm-modules/terraform-ibm-mas

Error from server (ServiceUnavailable): the server │ is currently unable to handle the request (get routes.route.openshift.io)

Closed this issue · 7 comments

Pipeline failed with this error, so tracking here to see how often we hit it in order to know if a deep dive is required:

│ Error: local-exec provisioner error
│ 
│   with module.existing_cluster.null_resource.maximo_admin_url,
│   on ../../main.tf line 125, in resource "null_resource" "maximo_admin_url":
│  125:   provisioner "local-exec" {
│ 
│ Error running command '../../scripts/getAdminURL.sh inst1 ../../url.txt':
│ exit status 1. Output: Error from server (ServiceUnavailable): the server
│ is currently unable to handle the request (get routes.route.openshift.io)
│ 
│ Sleeping for 60 seconds before retrying..
│ No resources found in mas-inst1-core namespace.
│ 
│ Sleeping for 60 seconds before retrying..
│ No resources found in mas-inst1-core namespace.
│ 
│ Sleeping for 60 seconds before retrying..
│ No resources found in mas-inst1-core namespace.
│ 
│ Sleeping for 60 seconds before retrying..
│ No resources found in mas-inst1-core namespace.
│ Admin URL can't be fetched. Something wrong. Please check on Openshift
│ cluster.
│ 
╵}

Looks like the issue was that mongodb failed:
image

So hence the route was not available. Since mongodb failed, that should of failed the installVerify.sh script and never got as far as running getAdminURL.sh so that bug needs to be fixed.

Full log from pipeline which hit issue: https://github.com/terraform-ibm-modules/terraform-ibm-mas/actions/runs/8936693737/job/24547676723

The installVerify.sh script should only pass if the output of oc get pr -n "${namespace}" -o=jsonpath='{.items[*].status.conditions[*].reason}' is "Completed". In that case you should see an echo saying "Install pipeline has completed successfully" but I don't see that in the logs, so there must be a bug somewhere in the logic that is allowing the script to incorrectly exist successfully

image

image

image

image

image

Couldn't recreate as I tried the end-to-end MAS DA flow twice from private catalog (pointing to latest code) on an existing cluster & I didn't see this issue. Not sure what has gone wrong during infra test of CI pipeline..will try running infra test to see if it can be recreated there..

@NatarajBTI its probably an intermittent issue that may take several attempts to reproduce (or you may not reproduce at all). Either way there has to be a bug in your script logic somewhere which allowed it to incorrectly exit early - thats what I wuld focus on now

I ran automated infra tests & it was successful in two attempts. Please refer this comment here --> #90 (comment).
Only change there was the cluster flavor used.

This issue isn't recreatable.

@ocofaigh - Can we close this issue? as MAS DA deployment is stable now & this issue isn't seen in several runs we had.

Agree - issue was related to the health of the cluster. Since the docs will specify the cluster must be healthy before deploying, I'm going to close this