Faster emergency rollout by skipping entire canary process
kangyawong-grabtaxi opened this issue · 0 comments
Describe the feature
What problem are you trying to solve?
Currently, we can use skipAnalysis
to bypass most of the canary steps. However, in some cases, the canary process can still be slow, especially when the time to pass the probe checks (time to ready) is longer than usual.
For instance, I have a few apps that require more than 10 minutes to pass the probe checks, and it's not feasible to refactor it in the foreseeable future.
While skipping the probe check might seem risky, it can be particularly useful for emergency roll-forwards when users are confident that the target spec works and they want to replace the buggy pods as fast as possible. Therefore, it would be beneficial to have a way to skip the entire canary process during emergencies without directly patching the primary objects.
Proposed solution
I wonder if it's possible to have a new boolean flag in the canary CRD to skip both the canary rollout status check and analysis. Here's a POC
A few drawbacks included:
- Skipping the compatibility check between the configmap/secrets (using configMapKeyRef) and the application - existing primary pods may break if they are reactive to the mounted config file changes and the config is not compatible
- In a way, the proposed flag is similar to the K8s rolling update, but without the need to uninstall Flagger to skip the canary
Any alternatives you've considered?
- Uninstalling or disabling Flagger might work, but it seems excessive for emergency deployments.
- A script to manually patch the primary objects - it's somewhat hacky and error prone IMO because I prefer Flagger for normal deployment flow
- canaryReadyThreshold check will be blocked by canary rolling update event (when replacing old replicaset with the new ones)
I'd be happy to discuss any other workarounds or suggestions 🙇♂️