Deleting stale silos that are marked active but the pod has already been removed
Opened this issue ยท 6 comments
After a pod exits prematurely it will create a silo entry with the status "Active". When this pod restarts it scans all silos from etcd and attempts to connect with its predecessor. I've noticed this behaviour several times when running in production on orleans 3 in a single silo cluster.
The Orleans.Hosting.Kubernetes package does this: https://dotnet.github.io/orleans/docs/deployment/kubernetes.html
It's still in "beta", but if you can try it and provide feedback, that will be helpful
Thanks! I will try it out in the next few days ๐
@ReubenBond the package you directed @TFarla to doesn't do clustering. Could you please tell me what is the benefit of using it at all?
Or should it be used in tandem?
@turowicz the hosting package does a few things:
- Configure silos based on the pod's environment (IP, name, ClusterId/ServiceId)
- Monitor Kubernetes for changes in active pods, so that deleted pods can be removed immediately, without the need for health probes. This doesn't remove the need for health probes altogether, but provides a short-cut in case an administrative action was taken.
- Kill pods in Kubernetes when they are declared dead by the cluster
It doesn't replace the need for a clustering provider. The sample here shows how to use it, and uses Redis for clustering (but something else could be used, of course) https://github.com/ReubenBond/hanbaobao-web
@ReubenBond thanks! I'm testing Orleans.Clustering.Kubernetes
with Orleans.Hosting.Kubernetes
now. It seems like this combo will work well.