HCO deployment via OLM on plain k8s is failing due to SSP operator

Question

HCO deployment via OLM on plain k8s is failing due to SSP operator

tiraboschi opened this issue 2 years ago · 1 comments

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened:
Actually we are building a single bundle for community-operator
and a cluster admin could deploy the OLM on his cluster and then use it to deploy our bundle.
That bundle is always trying to start also the SSP operator although the SSP operator is not able to run on something different from OCP/OKD.

So SSP operator fails with something like:

{"level":"info","ts":1666961995.2290175,"logger":"setup","msg":"OLM cert directory found, copying cert files"}
I1028 12:59:56.281001       1 request.go:601] Waited for 1.036346685s due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/hco.kubevirt.io/v1beta1?timeout=32s
{"level":"info","ts":1666961996.3323717,"logger":"setup","msg":"Found namespace","Namespace":"kubevirt-hyperconverged"}
{"level":"info","ts":1666961996.3347092,"logger":"setup","msg":"Starting Prometheus metrics endpoint server with TLS"}
{"level":"info","ts":1666961997.4385593,"logger":"controller-runtime.builder","msg":"skip registering a mutating webhook, object does not implement admission.Defaulter or WithDefaulter wasn't called","GVK":"ssp.kubevirt.io/v1beta1, Kind=SSP"}
{"level":"info","ts":1666961997.438605,"logger":"controller-runtime.builder","msg":"Registering a validating webhook","GVK":"ssp.kubevirt.io/v1beta1, Kind=SSP","path":"/validate-ssp-kubevirt-io-v1beta1-ssp"}
{"level":"info","ts":1666961997.4386854,"logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/validate-ssp-kubevirt-io-v1beta1-ssp"}
{"level":"error","ts":1666961997.6255345,"msg":"Some required crds are missing. The operator will not create any new resources.","missingCrds":["datavolumes.cdi.kubevirt.io","datasources.cdi.kubevirt.io","dataimportcrons.cdi.kubevirt.io","prometheusrules.monitoring.coreos.com"],"stacktrace":"kubevirt.io/ssp-operator/controllers.CreateAndStartReconciler\n\t/workspace/controllers/setup.go:58\nmain.main\n\t/workspace/main.go:184\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}
{"level":"error","ts":1666962000.0374146,"logger":"setup","msg":"unable to create or start controller","controller":"SSP","error":"failed to get infrastructure topology: no matches for kind \"Infrastructure\" in version \"config.openshift.io/v1\"","stacktrace":"main.main\n\t/workspace/main.go:185\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}

and so the CSV is marked as failed:

$ kubectl get csv -n kubevirt-hyperconverged
NAME                                                   DISPLAY                                    VERSION              REPLACES   PHASE
kubevirt-hyperconverged-operator.v1.9.0-202210261054   KubeVirt HyperConverged Cluster Operator   1.9.0-202210261054              Failed

although everything else is fully operational.

What you expected to happen:
HCO bundle can be successfully deployed via OLM anso on vanilla K8s

How to reproduce it (as minimally and precisely as possible):

deploy k8s
deploy OLM via:

curl -L https://github.com/operator-framework/operator-lifecycle-manager/releases/download/v0.22.0/install.sh -o install.sh
chmod +x install.sh
./install.sh v0.22.0

deploy HCO via OLM

Anything else we need to know?:

hack/deploy.sh and deploy/deploy.sh are explicitly bypassing the SSP operator when not on OCP/OKD
we should also add a CI lane running on kubevirtci deploying OLM and then HCO via OLM on plain k8s

Environment:

HCO version (use oc get csv -n kubevirt-hyperconverged): v1.9.0-202210261054
Kubernetes version (use kubectl version): v1.23.13
Cloud provider or hardware configuration: all
Install tools: OLM
Others:

Answer 1 · 2022-11-16T09:15:51.000Z

We can close it after releasing a new SSP version and bumping it in HCO.