intel/intel-device-plugins-for-kubernetes

Prepare 0.30.0 release

Closed this issue · 23 comments

Checklist:

  • run validation on main
    • QAT (generic)
    • GNR (IAA, DSA)
    • GNR-D (QAT)
    • FPGA
    • SPR (SGX, QAT, GPU, IAA, DSA)
  • Make sure kube-rbac-proxy is the latest version
  • create release-0.30 branch
  • release branch changes
    • edit default_labels.docker + make dockerfiles
    • make set-version TAG=0.30.0 + commit
    • update publish.yml to create docs for v0.30
  • draft release notes, review
  • publish release
  • main branch changes
    • update base README for supported versions and docs URL
    • update main branch's operator CRs to point to 0.30, also reconciler.go
  • update helm chart: PR
    • Make sure to update CRDs and README
  • update operatorhub.io bundle

There is an error related to this commit.
https://github.com/k8s-operatorhub/community-operators/actions/runs/9137548241/job/25127696012?pr=4366#step:3:5083

And, it seems that the ci/cd tests in the operatorhub has a specific namespace 'testeupgrade', which may mean that we cannot publish the bundle as it is now.

I tested also locally, and it shows the same error messages.
In addition, when I test removing the contents of the commit above, it runs successfully.

What do we need to do?

What do we need to do?

Find out what the error is about and plan the fix accordingly. I'm not clear why it fails. Did you check what the test case is about and what we are doing wrong?

I wonder if it's some upgrade test where the changed labels causes confusion.

edit: nevermind, apparently I can't read.

The fix that is causing this was related to the operator bundle (or multiples of them) so reverting the fix would just re-introduce the issue. Kinda.

Thanks to the help of @tkatila, i figured out that it is not possible to change the labels from the previous version.

We added one more from the previous version, so it is not possible to upgrade from the previous version.
I can see some similar case (https://gitlab.com/gitlab-com/gl-infra/production-engineering/-/issues/7331).

I found one source that talks about solving this problem.
https://olm.operatorframework.io/docs/troubleshooting/clusterserviceversion/

So, we may need to publish 0.30.0 that does not cause a problem with the addition of new label and then 0.30.1 which would be 'real version' of published operator.

k8s-operatorhub/community-operators#4375
I can see all tests got passed.
So, I guess, we have two options.

  1. Change the name of the deployment from inteldeviceplugins-controller-manager to something else permenantly
  2. Change the name of the deployment from inteldeviceplugins-controller-manager to something else temporarily and change back with 0.30.1 version.

The reason why I am suggesting the second option is because I do not know if we need to 'keep' the current name inteldeviceplugins-controller-manager.

@mythi @tkatila Let me know which way you think is better! :)

I'm trying to think of a way that would not include bumping up the version number and creating a patch release.

If we update the name permanently, what are the downsides for it? Some upgrade somewhere would result in two copies of the operator?
Are we sure a 0.29.0->0.30.0->0.30.1 upgrade path would work (changin name back and forth)? What if the user upgrades from 0.29.0 to 0.30.1, wouldn't he/she get the same error?

  1. Change the name of the deployment from inteldeviceplugins-controller-manager to something else permenantly

What happens to the old deployment if you add a new (renamed) one as part of the OLM upgrade?

I submitted a question to the community operators project: k8s-operatorhub/community-operators#4434

It seems that they are not replying anything.
Can we just proceed as official document suggests? (changing the inteldeviceplugins-controller-manager to something else permanently)

After discussing with @hj-johannes-lee I'd propose a transient deployment name change in the operator bundle:

  1. Release 0.30.0 with a different deployment name (only in operator bundle)
  2. Keep deployment name as-is in the main branch
  3. With 0.31.0 release in the operator bundle, the deployment name would "revert" back to the original one
  4. In the 0.31.0 release notes, we would make a note that upgrade from <=0.29.0 to 0.31.0 is not possible without going to 0.30.0 first.

@mythi k8s-operatorhub/community-operators#4375
It's ready to be merged. If you agree to go forward, let me make it merged.

@mythi k8s-operatorhub/community-operators#4375 It's ready to be merged. If you agree to go forward, let me make it merged.

what is the reason for the step 3.?

Umm, to be honest, I think there would be no problem to change to something else permanently (only when it comes to the operatorhub bundle). But, Tuomas thought there might be some problems.

What about then first letting the pr merged and then decide about step 3,4 later?

What about then first letting the pr merged and then decide about step 3,4 later?

works for me. can you also submit a PR here to get that warning fixed?

What about then first letting the pr merged and then decide about step 3,4 later?

works for me. can you also submit a PR here to get that warning fixed?

nevermind, I just ran into #1785

published. deployname is inteldeviceplugins-controller-manager-0-30-0

We can decide later if we change back to inteldeviceplugins-controller-manager or just create a new and permanent one.

The new name looks odd and forces us to make a change...

@mythi what name do you think is good?

@mythi what name do you think is good?

something that is not attached to a specific version (e.g., 0-30-0)

then
from inteldeviceplugins-controller-manager to intel-deviceplugins-controller-manager
k8s-operatorhub/community-operators#4743 (comment)

As discussed, let's keep the deployment name same in the bundle (as it is in 0.30.0), and change the deployment name in the project yamls. This will require manual changes for the 0.31.0 bundle but 0.32.0 onward shouldn't require any manual edits.