kubevirt/hyperconverged-cluster-operator

Could not delete vmi on OKD Cluster

Closed this issue · 7 comments

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

What happened:
To delete VMI, I tried to use cli kubectl delete -f vm.yaml, but it is stuck as below:
image

Actually before delete, when I tried to check the console on OKD WebUI, console was not shown.
Also service is not found message was occurred, when I tried to delete or stop vmi from OKD webUI.
image

Even though virt-launcher-xxx pod was deleted, vmi is not deleted as I wish.
How to solve this?

What you expected to happen:
Delete VMI

How to reproduce it (as minimally and precisely as possible):

  1. Create VMI
  2. Delete VMI ('service not found' error occurred)
  3. Delete virt-launcher-xxxx pod
  4. Try to delete vmi with CLI and WebUI -> Not Delete

Anything else we need to know?:

Environment:

  • HCO version (use oc get csv -n kubevirt-hyperconverged): 1.5.0
  • Kubernetes version (use kubectl version): 1.18
  • Cloud provider or hardware configuration:
  • Install tools:
  • Others: OKD 4.5 Cluster

@davidvossel - could you please take a look?

@rupang790 - I'm not sure I get the issue right. What do you mean by (2) and what is the difference from (4)?

Also, OKD 4.5 is too old for HCO 1.5.0. Please use 4.8

@nunnatsa
About the Version of OKD, I could not upgrade to 4.8 for now, because we try to fix some specific version of OKD currently.

Actually, after reboot the vm on vmi console by cli sudo systemctl reboot, I could not handle lifecycle of vmi (such as start, restart or stop vmi) on WebUI at all.
To sum up, I can no longer handle vmi via webui. So, I tried to remove the vmi through the oc command, but it failed. As the last, I removed the virt-launcher-xxx pod, but the vmi is still on the WebUI.

Please consume HCO 1.3.0 on OKD 4.5.0

@tiraboschi, as your recommended,

Please consume HCO 1.3.0 on OKD 4.5.0

I tried to install HCO with 1.3.0 version. I changed git branch to release-4.6, which used HCO 1.3.0 version, and run deploy.sh to install on my OKD 4.5 Cluster. But it seems some issues with `node-maintenance-operator' (I tried with multiple version from 0.7.0 to 0.9.0).
you can check its error logs as:

{"level":"info","ts":1629693798.0988977,"logger":"cmd","msg":"Go Version: go1.15.6"}
{"level":"info","ts":1629693798.099137,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"}
{"level":"info","ts":1629693798.099201,"logger":"cmd","msg":"Version of operator-sdk: v0.18.2"}
{"level":"info","ts":1629693798.0992622,"logger":"cmd","msg":"Operator Version: v0.9.0"}
{"level":"info","ts":1629693798.099303,"logger":"cmd","msg":"Git Commit: e3c8bbf0e449fa15ed744c0c3ab16c28129f004f"}
{"level":"info","ts":1629693798.099339,"logger":"cmd","msg":"Build Date: 2021-05-25T15:12:52+00:00"}
I0823 04:43:19.150882       1 request.go:621] Throttling request took 1.029648802s, request: GET:https://172.30.0.1:443/apis/operators.coreos.com/v1?timeout=32s
{"level":"info","ts":1629693808.1519032,"logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":"0.0.0.0:8383"}
{"level":"info","ts":1629693808.153093,"logger":"cmd","msg":"Registering Components."}
I0823 04:43:29.229154       1 request.go:621] Throttling request took 1.04475203s, request: GET:https://172.30.0.1:443/apis/admissionregistration.k8s.io/v1beta1?timeout=32s
{"level":"info","ts":1629693811.3307118,"logger":"cmd","msg":"failed to create or get service for metrics: services \"node-maintenance-operator-metrics\" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , <nil>"}
{"level":"error","ts":1629693811.3308554,"logger":"cmd","msg":"Failed to prepare webhook server, certificates not found","error":"stat /apiserver.local.config/certificates/apiserver.crt: no such file or directory","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/kubevirt.io/node-maintenance-operator/vendor/github.com/go-logr/zapr/zapr.go:128\nmain.setupWebhookServer\n\t/go/src/kubevirt.io/node-maintenance-operator/cmd/manager/main.go:156\nmain.main\n\t/go/src/kubevirt.io/node-maintenance-operator/cmd/manager/main.go:136\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:204"}
{"level":"error","ts":1629693811.331004,"logger":"cmd","msg":"Failed to setup webhook server","error":"stat /apiserver.local.config/certificates/apiserver.crt: no such file or directory","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/kubevirt.io/node-maintenance-operator/vendor/github.com/go-logr/zapr/zapr.go:128\nmain.main\n\t/go/src/kubevirt.io/node-maintenance-operator/cmd/manager/main.go:137\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:204"}

Could you please help me to solve this problem? I have some issue that I could not use other version of OKD for now.

Can you please try deploying from release-1.3 branch?
I'll try reproducing as well.

I successfully deployed and tested kubevirt-hyperconverged-operator.v1.3.0 from the stable channel of community-operators on OKD 4.5