operator-framework/helm-operator-plugins

[Feature] Rollout Canaries across the OLM fleet for on-premise customers

SimonBaeumer opened this issue · 1 comments

Problem
The ACS operator runs on over 1,500 clusters, most of them use automatic upgrades. As soon as we publish the ACS operator to the OpenShift catalog all clusters try to upgrade immediately.
A failure in the upgrade process results in a lot of support tickets and high impact on customer environments as all clusters are affected.

Solution
It would be great to have control over the rollout. Preferably we can configure in which order clusters upgrade automatically, in case of a rollout failure we can halt a rollout and fix the issue first.

Alternatively, an "upgrade available" endpoint to allow upgrades would be helpful. This endpoint is exposed on the operator, OLM queries the endpoint, if the endpoint is ready OLM upgrades the operator.
To implement canaries the operator would connect to an ACS/Red Hat server and we implement the rollout process to our needs.
See related issue on upgrades endpoint: #232

Sorry, wrong repository