How to troubleshoot a failure during run bundle-upgrade
etsauer opened this issue · 1 comments
Type of question
Best practices
How to implement a specific feature
Question
I'm trying to test the upgrade path for my helm-based operator using the sdk bundle. It's failing because of a missing install plan, but I'm not sure exactly how to get more information about what I'm doing wrong.
What did you do?
# Install current release of the operator
operator-sdk run bundle quay.io/pelorus/pelorus-operator-bundle:v0.0.9 --namespace test-pelorus-operator
# Once installed successfully, attempt the upgrade
operator-sdk run bundle-upgrade quay.io/pelorus/rc-pelorus-operator-bundle:vpr1157-34d9eef --namespace test-pelorus-operator --verbose
What did you expect to see?
I hoped to see the upgrade succeed.
What did you see instead? Under which circumstances?
The install failed after the deleting of the old registry pod:
INFO[0018] Generated a valid Upgraded File-Based Catalog
INFO[0020] Created registry pod: quay-io-pelorus-rc-pelorus-operator-bundle-vpr1157-34d9eef
INFO[0020] Updated catalog source pelorus-operator-catalog with address and annotations
INFO[0021] Deleted previous registry pod with name "quay-io-pelorus-pelorus-operator-bundle-v0-0-9"
FATA[0120] Failed to run bundle upgrade: install plan is not available for the subscription pelorus-operator-v0-0-9-sub: context deadline exceeded
I also see the following subscriptions and installplans:
$ oc get subscription -n test-pelorus-operator
NAME PACKAGE SOURCE CHANNEL
grafana-operator-v4-community-operators-openshift-marketplace grafana-operator community-operators v4
pelorus-operator-v0-0-9-sub pelorus-operator pelorus-operator-catalog operator-sdk-run-bundle
prometheus-beta-community-operators-openshift-marketplace prometheus community-operators beta
$ oc get installplan -n test-pelorus-operator
NAME CSV APPROVAL APPROVED
install-t2fjd grafana-operator.v4.8.0 Manual true
NOTE: the prometheus and grafana operators are dependencies of this operator, which is why you see them in this namespace.
Environment
Operator type:
/language helm
Kubernetes cluster type:
OpenShift 4.15
$ operator-sdk version
operator-sdk version: "v1.33.0", commit: "542966812906456a8d67cf7284fc6410b104e118", kubernetes version: "1.27.0", go version: "go1.21.5", GOOS: "linux", GOARCH: "amd64"
$ kubectl version
Additional context
This happens with or without the actual operand resource created, so it seems to be some pretty basic issue, maybe with how the bundle is configured.
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale