graphops/launchpad-charts

graph-node query nodes should be deployments

Closed this issue · 5 comments

... and not statefulsets.

That would allow for smooth rolling restarts even when replicaCount: 1, as well as a more fluid node assignment.

@aasseman thank you for raising this :) Could I bother you to expand a bit into what are the specific behaviors that the Stateful Set is exhibiting and would be improved with the Deployment kind?

We are considering this change, it will not however be as trivial or as quickly implemented as in some other charts given how the graph chart is structured (a single all.yaml template that does a logic-heavy templating, and the fact that you can freely define node groups, or not even use node groups that are specifically for querying)

Rolling restart example in the extreme case of replicas=1

Deployment:

  1. New pod with updated image is created.
  2. Wait for the new pod to be ready
  3. Traffic is shifted to the new pod.
  4. Old pod is terminated. (No downtime)

StatefulSet:

  1. Old pod is stopped. (so we're at 0 pods now).
  2. New pod (with same identity as the old one) is started. But not necessarily ready yet...
  3. New pod is ready, service is restored

If there are more replicas, the situation is similar, although you won't have downtime in the case of the statefulset, but still N-1 replicas ready while pods are stopped and started one by one.

In the deployments case, you will still have N replicas ready and serving at any point in time during the rolling restart.

Deployments make it also easier for autoscalers of stuff like Descheduler to move deployment's pods around without any fuss to optimize the balancing of workloads between nodes.

Thank you @aasseman! that makes a lot of sense. Don't see any way of changing that behavior on a Stateful Set.
On that same scenario, this could only apply to query nodes, as indexing nodes or the ingestor one need stable names, and you never want two of the same identity to run in parallel. Now, query nodes are still just graph nodes, and their role can be changed at run time by changing the config.. so potentially people can shoot themselves in the foot here if they start with a node that is query only, get a deployment by default, and then change that config.

So at the very least, gonna look into a way of letting users turn some flag or otherwise specify they want a Deployment kind for certain node-groups, still not 100% sure if we will change the default.

Yes, deployment can only work for query-nodes.