Deleting stale silos that are marked active but the pod has already been removed

Question

Deleting stale silos that are marked active but the pod has already been removed

Opened this issue 4 years ago · 6 comments

After a pod exits prematurely it will create a silo entry with the status "Active". When this pod restarts it scans all silos from etcd and attempts to connect with its predecessor. I've noticed this behaviour several times when running in production on orleans 3 in a single silo cluster.

Answer 1 · 2021-01-17T05:23:23.000Z

The Orleans.Hosting.Kubernetes package does this: https://dotnet.github.io/orleans/docs/deployment/kubernetes.html
It's still in "beta", but if you can try it and provide feedback, that will be helpful

Answer 2 · 2021-01-17T09:10:49.000Z

Thanks! I will try it out in the next few days 😄

Answer 3 · 2021-01-26T10:07:26.000Z

@ReubenBond the package you directed @TFarla to doesn't do clustering. Could you please tell me what is the benefit of using it at all?

Answer 4 · 2021-01-26T10:17:44.000Z

Or should it be used in tandem?

Answer 5 · 2021-01-26T13:05:33.000Z

@turowicz the hosting package does a few things:

Configure silos based on the pod's environment (IP, name, ClusterId/ServiceId)
Monitor Kubernetes for changes in active pods, so that deleted pods can be removed immediately, without the need for health probes. This doesn't remove the need for health probes altogether, but provides a short-cut in case an administrative action was taken.
Kill pods in Kubernetes when they are declared dead by the cluster

It doesn't replace the need for a clustering provider. The sample here shows how to use it, and uses Redis for clustering (but something else could be used, of course) https://github.com/ReubenBond/hanbaobao-web

Answer 6 · 2021-01-26T13:22:37.000Z

@ReubenBond thanks! I'm testing Orleans.Clustering.Kubernetes with Orleans.Hosting.Kubernetes now. It seems like this combo will work well.