kyma-project/kyma

Support maintenance window for module upgrades

pbochynski opened this issue · 13 comments

Description
Requirements:

  • possibility to mark versions that require a maintenance window
  • possibility to assign clusters to maintenance windows (maintenance windows can be different for regions, customers, etc)

Modules should still be reconciled to make sure that the Manifest is healthy, given module upgrades should happen ONLY during the maintenance window.

Reasons

Some module upgrades can come with service disruption that should be handled within the maintenance window.

Implementation Concept

  • Module version should get additional attribute (requires downtime)
  • Maintenance windows should be assigned to the landscape, region, or individual Kyma instance (maybe annotation in the Kyma CR in KCP)
  • KLM should respect maintenance widows
  • The user should be able to override the default maintenance window (annotating Kyma CR in the cluster)

Acceptance Criteria

  • Describe the Maintenance Window behaviour
    • Given a Kyma Module in the Kyma Module Catalogue
      • When a new Kyma Module version is added to the Kyma Module Catalogue
        • Then an entry in Module Catalogue is updated
          • And the Kyma Module entry contains information whether a new version of Kyma Module should be upgraded during the Maintenance Window
      • When the Lifecycle Manager reconciles Kyma Module
        • And the new version of the Kyma Module is available and requires a Maintenance Window
          • And the Maintenance Window is currently active
            • Then the Kyma Module is upgraded to the new version
          • And the Maintenance Window is currently not active
            • Then the Kyma Module is not upgraded and reconciled in the current version until the Maintenance Window is active
          • ❓ And the User chose to ignore maintenance windows feature
            • Then the Kyma Module is upgraded to the new version

Multiple versions of the modules will be introduced in this epic: kyma-project/lifecycle-manager#1472

Until then, there is no persistence of multiple module version and thus, this ticket is blocked.

Acceptance Criteria for EPIC:

  • Research about the topic
  • Think about a proper issue split for this EPIC
  • Create Sub-Issues to cover the implementation
  • Regarding business requirements talk to @janmedrek if needed

Timebox: 2 Days

Sub Issues:

@janmedrek @kyma-project/jellyfish The specification of the maintenance window policy to be supported by KLM:
https://github.tools.sap/kyma/backlog/issues/5462#issuecomment-6963879

The same will be supported (ehhanced) in the orchestration operatoror developed by us.

@janmedrek when do you plan to start woring on this epic?

Starting from 2025 Q1 we must execute all module rollouts impacting availability in the harmonized BTP major upgrade windows. Hence this feature of KLM should be delivered latest by end of Q4.

Existing code for parsing policies and resolving maintenance windows - internal link

We would need following information from KEB to be put on Kyma CR in KCP:

  "globalAccountID": string
  "plan": string
  "region": string
  "platformRegion": string

Kyma CRs are created by KEB from the @kyma-project/gopher team.
Regarding all the maintenance window policies/rules/specifications please reach out to the SRE team.

Current labels on the Kyma CRs on stage:

labels:
    kyma-project.io/broker-plan-id: 
    kyma-project.io/broker-plan-name:
    kyma-project.io/global-account-id: 
    kyma-project.io/instance-id: 
    kyma-project.io/platform-region: 
    kyma-project.io/provider:
    kyma-project.io/region: 
    kyma-project.io/runtime-id: 
    kyma-project.io/shoot-name: 
    kyma-project.io/subaccount-id: 
    operator.kyma-project.io/beta: 
    operator.kyma-project.io/internal: 
    operator.kyma-project.io/kyma-name: 
    operator.kyma-project.io/managed-by: 

Should be sufficient if kyma-project.io/broker-plan-name corresponds to plan in the Maintenance Window Policy CRs

@ebensom perhaps it would make sense to make a separate Go library for that? I believe that would be beneficial as it seems multiple sources will re-use that logic to resolve maintenance windows.

@janmedrek that's a good idea. Should we put it somewhere in the kyma-project/kyma repo, under pkg?

Regarding AC:

  • When the Lifecycle Manager reconciles Kyma Module
    • And the new version of the Kyma Module is available and requires a Maintenance Window
      • And the Maintenance Window is currently not active
        • Then the Kyma Module is not upgraded and reconciled in the current version until the Maintenance Window is active

With the new module metadata this should be implementable cleanly. If we encounter a module update requiring downtime, e.g. from MT-1.0.0 to MT-2.0.0, we can delay the update by just pointing to MT-1.0.0 and leave the downstream processing from ModuleTempalte to Manifest as is.

With the current module metadata this is harder. The ModuleTemplate, e.g. MT-regular, changes in place so we don't have an "old" ModuleTemplate we can point to. We will need to flag this situation somehwere in the module processing and in the downstream processing (ModuleTemplate to Manifest) skip updating the Manifest.

@kyma-project/gopher can somebody confirm this one pls?

Should be sufficient if kyma-project.io/broker-plan-name corresponds to plan in the Maintenance Window Policy CRs

E.g., for a random Kyma on STAGE I can see kyma-project.io/broker-plan-name: aws, but matching rules for the maintenance window policy look like "match": { "plan": "trial|free" }. Therefore not so sure if this is the correct mapping.

kyma-project.io/broker-plan-name is the correct label determining the plan name.

Was confirmed "offline", thanks @jaroslaw-pieszka!