Operator upgrade should not cause a rolling restart

Question

Operator upgrade should not cause a rolling restart

Closed this issue 3 months ago · 5 comments

What is missing?

When updating the operator, there are two cases where we will actually cause a rolling restart of all Cassandra clusters managed by the same operator.

imageConfig updates which will ship with updates to k8ssandra-client and server-system-logger
Updates to PodTemplateSpec caused by something new we introduced / wanted to change

Now, while both are good updates usually, they shouldn't happen if user makes no changes. We would hope that user upgrades to newest builds of server-system-logger for example to get bugfixes / CVE free base builds. Or if we upgrade Cassandra to use better settings we usually do that for stability purposes. However, neither should happen without the knowledge of the user and shouldn't be an uncontrolled side-effect of updating the operator.

Instead, we should note in the release notes if we think user should do an rolling restart to get newer images or other improvements.

I would propose we modify the processing slightly in cass-operator. If the Generation of CassandraDatacenter is equal to the ObservedGeneration, we would stop doing updates to any spec we deploy, but keep on doing the other parts of the processing (such as restarting nodes). Thus, this event processing can't happen in the Reconcile function's beginning, but instead at the points where something would be updated. All decommissions, status updates to CassandraDatacenter/status and other maintenance should continue to work as previously.

A rolling restart should cause the upgrade to the StS specs and others we detect. While this is a side-effect, there's an additional benefit to this and that's self healing. At the moment, modifying StS is a possibility which we will not revert back, but that could let users make the cluster unstable and operator would do nothing to fix that situation. Now, current rolling restart is done by modifying the PodTemplateSpec StS annotation that says restart time (like kubectl's rollout restart does), so we need to catch even this small change. What we have to do is prevent rolling restart twice, first for annotation change and then by cass-operator change. Not sure how to do that in every case, but that should be the goal.

We will need a test for all of this using the "upgrade_operator" test (and make it work with newer Cassandra version also) to detect cases where operator upgrade would do a rolling restart and fail the build in those cases.

Possible issues:

An update to operator really requires an update to server-system-logger for example, such as that it can't function without those modifications. However, this same problem could arise with versions of management-api as well and we have a detection for that. So, similar approach should work and we could just make events + logging to indicate to users that they really should do a restart at some point.
How many upgrades can we accept? What if users keep updating the operator, but the cluster hasn't been restarted in years? Rare issue probably, but not impossible. Can we still maintain a cluster that is based on some configuration a very old cass-operator version did? Should we add an annotation that indicates which version of cass-operator created the build so we could detect such cases? It could fit into the existing "created-by / managed-by" annotations without making a huge difference. We'll treat no-annotation as 1.16.0 for example.

Why is this needed?

Updating the operator should be safe for the cluster - always. We should not make people run old versions of cass-operator because they're afraid of updating or do not have a possibility of restarting all clusters. Especially multiple clusters managed by the same cass-operator could be problematic as that would cause heavy load on the underlying Kubernetes cluster.

Tasks

Beta Give feedback

cass-operator: If the Generation is the same as the ObservedGeneration we don't update the StatefulSet objects in Kubernetes (instead, we mark the CassandraDatacenter as requiring refresh). But we only do this if and only if we would update the StatefulSets in a way that would cause rolling restart. Metadata updates would still be applied.
upstream operators: Same behavior where we prevent the updates of the cluster/dc custom resources if the generation wasn't updated
A condition should be placed on the object to indicate that a restart is pending (something as "RestartRequired" condition)
We would introduce a new annotation on the CassandraDatacenter objects allowing self healing of the StatefulSets, thus disabling the new behavior of not automatically fixing them
In order to apply the upgrade changes, we can rely on a new annotation cassandra.datastax.com/allow-sts-upgrade=true. This annotation would be placed by either the users or the rolling restart K8ssandra/CassandraTask. If this annotation is detected, the Spec update will be unblocked, the "RestartRequired" condition will be set to false and the annotation will be removed at the end of the reconcile.
If update is required and restart task is requested (for entire Datacenter), the restart of StS is not done in the task, but instead we only modify the CassandraDatacenter object. At that point, we should also track the restart from it and not from the status of StatefulSets. Alternatively (perhaps better), we add new task called "update/refresh" which would do this. Or we don't do this task at all and instead just let user modify the CassandraDatacenter object..
We need to allow reconcile to trigger if this annotation is placed on the objects (cassdc, k8c). This requires an additional predicate for annotation listeners, but then one more filter. This could be useful outside of this feature also.
Options

Answer 1 · 2023-08-31T09:10:36.000Z

Tentatively supported. I'll look forward to watching your discussion, but it sounds like you've thought this through.

I do think perhaps we should log errors or something if the user has done 1-2 consecutive upgrades without a rolling restart, but I concur that they should always be in control of the timing of the restart itself.

Answer 2 · 2024-03-01T14:16:54.000Z

@burmanm , since this ticket is sized L, could you split it up in multiple subtasks? We'll make this one an epic.
Thanks!

Answer 3 · 2024-04-04T15:33:27.000Z

We would introduce a new annotation on the CassandraDatacenter objects allowing self healing of the StatefulSets, thus disabling the new behavior of not automatically fixing them

In order to apply the upgrade changes, we can rely on a new annotation cassandra.datastax.com/allow-sts-upgrade=true

These two sound closely related. To keep things simple, maybe we could use a single annotation with different possible values: cassandra.datastax.com/allow-sts-upgrade=always|once

Answer 4 · 2024-04-04T15:42:15.000Z

We could, the "once" would be removed after applying it, but always would stay in that case.

Answer 5 · 2024-04-16T14:01:41.000Z

From other operators side:

If k8ssandra-operator has upgrades and cass-operator has upgrades, we might need only k8ssandra-operator to apply new CassandraDatacenter. But also, we might have a case, where cass-operator only has updates, but k8ssandra-operator doesn't.

So our K8ssandraTask needs more logic than the CassandraTask.