Automatically handle errors on nodes and quarantine nodes
Closed this issue · 0 comments
cezarsa commented
Automatically handle errors on nodes. When an error happen it should try another node and move the node to some kind of quarantine zone.
The cluster should try to use nodes on quarantine for some time. After a number of errors a recovery process should be started, possibly spawning a new node and moving containers from one node to another.
The recovery process should be handled by the client implementing some kind of Recovery interface provided by the Cluster