kubernetes-sigs/cluster-api-provider-vsphere

Improve e2e testing

sbueringer opened this issue ยท 16 comments

Improve e2e tests

We compared CAPV govmomi & supervisor e2e tests with core CAPI e2e tests.
As a result we want to make the following improvements.

Adding e2e tests:

  • P0: Add v1.30 => v1.31 upgrade e2e test + check ci-latest (supervisor+govmomi) (@sbueringer)
    • PR: kubernetes/test-infra#32610
    • PR: #2999
    • Plan is to add more and more upgrades jobs for new Kubernetes versions until we have the same test coverage as core CAPI.
  • P0: Add test "When testing ClusterClass changes [ClusterClass]" (supervisor) (@zhanggbj)
  • P1: Add test "Ensure OwnerReferences and Finalizers are resilient" (supervisor) (@fabriziopandini)
  • P1: Also use ValidateResourceVersionStable func (kubernetes-sigs/cluster-api#10530) to ownerRef & finalizer tests (supervisor+govmomi) (@fabriziopandini) #3033
  • P1: Let's check if we have unit test coverage for the following tests for supervisor @chrischdi
  • P1: Add clusterctl upgrade tests n-2, n-1 (supervisor) (@zhanggbj)
    • CAPV 1.9=>current, CAPI 1.6=>1.7, CAPV 1.10=>current, CAPI 1.7=>1.7 (same we already have on main for govmomi)
    • PR: #3024
  • P1: Add test "When testing ClusterClass rollouts [ClusterClass]" (supervisor) (@zhanggbj)
  • P1: When upgrading a workload cluster using ClusterClass with RuntimeSDK [ClusterClass] (supervisor) @chrischdi
    • This requires implementing a minimal Runtime Extension (let's discuss before we start implementing it, and let's do the other more straightforward sub-tasks first)
    • CAPI PR: kubernetes-sigs/cluster-api#10788
    • CAPV PR: #3069
  • P1: Audit if we can run more tests with vcsim @chrischdi

After CAPV v1.11: Adding e2e tests:

  • P0: Add clusterctl upgrade tests n-3 (supervisor): #3159
  • P2: Add test "When using the autoscaler with Cluster API using ClusterClass" @chrischdi #3161

Other improvements:

Reconsider if we need it later:

  • Add test: When using the autoscaler with Cluster API using ClusterClass (incl. scale from zero coverage?)

govmomi has the following additional jobs (intentionally):

  • accepted: it's okay if it's only tested with govmomi:
    • capv-e2e.[It] Cluster creation with [Ignition] bootstrap [PR-Blocking] Should create a workload cluster
  • accepted: not surfaced in CAPV supervisor CRDs:
    • capv-e2e.[It] Cluster creation with anti affined nodes should create a cluster with anti-affined nodes
    • capv-e2e.[It] Cluster creation with storage policy should create a cluster successfully
    • capv-e2e.[It] DHCPOverrides configuration test when Creating a cluster with DHCPOverrides configured Only configures the network with the provided nameservers
  • accepted: IPAM provider not supported in supervisor mode
    • capv-e2e.[It] ClusterClass Creation using Cluster API quick-start test and IPAM Provider [ClusterClass] Should create a workload cluster

/area supervisor
Given that most of the improvements are for supervisor mode

I would like to start with

  • P0: Add test "When testing ClusterClass changes [ClusterClass]" (supervisor) (#3011 )

Also would like to pick up a similar one :-)

  • P1: Add test "When testing ClusterClass rollouts [ClusterClass]" (supervisor)

It seems this one is more about testing efforts. Just raised a PR to test.

  • P1: Add clusterctl upgrade tests n-2, n-1 (supervisor)

Yeah good case is that most if it just works once added :)

A few require a bit more work

Taking this one

  • P1: Let's check if we have unit test coverage for the following tests for supervisor

I created a PR for

  • P1: Let's check if we have unit test coverage for the following tests for supervisor @chrischdi
    • capv-e2e.[It] Label nodes with ESXi host info creates a workload cluster whose nodes have the ESXi host info

Regarding the other subpoint:

  • capv-e2e.[It] Hardware version upgrade creates a cluster with VM hardware versions upgraded

There is already unit-test coverage here:

We define the VSphereMachine here, the called function has a hardcoded value for vmx-17:

vsphereMachine = util.CreateVSphereMachine(machineName, clusterName, className, imageName, storageClass, controlPlaneLabelTrue)

Later we check the spec of the VirtualMachine to match the minHardwareVersion (again 17):

Expect(vmopVM.Spec.MinHardwareVersion).To(Equal(minHardwareVersion))

Plus an immutability check at the end:

By("Updates to immutable VMOp fields are dropped", func() {
vsphereMachine.Spec.ImageName = "new-image"
vsphereMachine.Spec.ClassName = "new-class"
vsphereMachine.Spec.StorageClass = "new-storageclass"
vsphereMachine.Spec.MinHardwareVersion = "vmx-9999"
vsphereCluster.Status.ResourcePolicyName = "new-resourcepolicy"
requeue, err = vmService.ReconcileNormal(ctx, supervisorMachineContext)
verifyOutput(supervisorMachineContext)
})

@chrischdi @fabriziopandini if I see correctly we don't have the cluster autoscaler test on the list, let's add it as P2?

@sbueringer I could take this one once #3024 is merged. Thanks!

- [ ] Add clusterctl upgrade tests n-3 (supervisor)

  • Add clusterctl upgrade tests n-3

@zhanggbj was just chatting with Christian and Fabrizio about it, we just realized that we didn't have supervisor e2e test in CAPV v1.8. So it would be a lot of effort to add v1.8 => main tests.

We would propose instead to wait until after the v1.11 release in ~ 6 weeks, and then we can just have v1.9 => main, v1.10 => main and v1.11 => main (so we get n-3 coverage just a bit later but with a lot less effort).

Probably we can just include it in the "Prepare main branch for development of the new release" task which is part of #3084. Usually we drop the oldest test there, but in that case we will simply keep it around.

@sbueringer Makes sense to me. This will provide n-3 tests with much less effort. Thanks!

@zhanggbj Sorry I was just working on prepare main branch and realized that "Add clusterctl upgrade tests n-3" will be covered by that, so I took over that task from you.

@sbueringer Just go ahead, no worries :)

@chrischdi Assigned Add test "When using the autoscaler with Cluster API using ClusterClass" to you + updated the PR description to add "part of ..."

Moved remaining sub-tasks to separate issues:

/close

Great work everyone!

@sbueringer: Closing this issue.

In response to this:

Moved remaining sub-tasks to separate issues:

/close

Great work everyone!

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.