k8ssandra/cass-operator

Operator upgrade should not cause a rolling restart

Closed this issue · 5 comments

What is missing?

When updating the operator, there are two cases where we will actually cause a rolling restart of all Cassandra clusters managed by the same operator.

  • imageConfig updates which will ship with updates to k8ssandra-client and server-system-logger
  • Updates to PodTemplateSpec caused by something new we introduced / wanted to change

Now, while both are good updates usually, they shouldn't happen if user makes no changes. We would hope that user upgrades to newest builds of server-system-logger for example to get bugfixes / CVE free base builds. Or if we upgrade Cassandra to use better settings we usually do that for stability purposes. However, neither should happen without the knowledge of the user and shouldn't be an uncontrolled side-effect of updating the operator.

Instead, we should note in the release notes if we think user should do an rolling restart to get newer images or other improvements.

I would propose we modify the processing slightly in cass-operator. If the Generation of CassandraDatacenter is equal to the ObservedGeneration, we would stop doing updates to any spec we deploy, but keep on doing the other parts of the processing (such as restarting nodes). Thus, this event processing can't happen in the Reconcile function's beginning, but instead at the points where something would be updated. All decommissions, status updates to CassandraDatacenter/status and other maintenance should continue to work as previously.

A rolling restart should cause the upgrade to the StS specs and others we detect. While this is a side-effect, there's an additional benefit to this and that's self healing. At the moment, modifying StS is a possibility which we will not revert back, but that could let users make the cluster unstable and operator would do nothing to fix that situation. Now, current rolling restart is done by modifying the PodTemplateSpec StS annotation that says restart time (like kubectl's rollout restart does), so we need to catch even this small change. What we have to do is prevent rolling restart twice, first for annotation change and then by cass-operator change. Not sure how to do that in every case, but that should be the goal.

We will need a test for all of this using the "upgrade_operator" test (and make it work with newer Cassandra version also) to detect cases where operator upgrade would do a rolling restart and fail the build in those cases.

Possible issues:

  • An update to operator really requires an update to server-system-logger for example, such as that it can't function without those modifications. However, this same problem could arise with versions of management-api as well and we have a detection for that. So, similar approach should work and we could just make events + logging to indicate to users that they really should do a restart at some point.
  • How many upgrades can we accept? What if users keep updating the operator, but the cluster hasn't been restarted in years? Rare issue probably, but not impossible. Can we still maintain a cluster that is based on some configuration a very old cass-operator version did? Should we add an annotation that indicates which version of cass-operator created the build so we could detect such cases? It could fit into the existing "created-by / managed-by" annotations without making a huge difference. We'll treat no-annotation as 1.16.0 for example.

Why is this needed?

Updating the operator should be safe for the cluster - always. We should not make people run old versions of cass-operator because they're afraid of updating or do not have a possibility of restarting all clusters. Especially multiple clusters managed by the same cass-operator could be problematic as that would cause heavy load on the underlying Kubernetes cluster.

Tasks

Tentatively supported. I'll look forward to watching your discussion, but it sounds like you've thought this through.

I do think perhaps we should log errors or something if the user has done 1-2 consecutive upgrades without a rolling restart, but I concur that they should always be in control of the timing of the restart itself.

@burmanm , since this ticket is sized L, could you split it up in multiple subtasks? We'll make this one an epic.
Thanks!

  • We would introduce a new annotation on the CassandraDatacenter objects allowing self healing of the StatefulSets, thus disabling the new behavior of not automatically fixing them
  • In order to apply the upgrade changes, we can rely on a new annotation cassandra.datastax.com/allow-sts-upgrade=true

These two sound closely related. To keep things simple, maybe we could use a single annotation with different possible values: cassandra.datastax.com/allow-sts-upgrade=always|once

We could, the "once" would be removed after applying it, but always would stay in that case.

From other operators side:

If k8ssandra-operator has upgrades and cass-operator has upgrades, we might need only k8ssandra-operator to apply new CassandraDatacenter. But also, we might have a case, where cass-operator only has updates, but k8ssandra-operator doesn't.

So our K8ssandraTask needs more logic than the CassandraTask.