tsuru/docker-cluster

Automatically handle errors on nodes and quarantine nodes

Closed this issue · 0 comments

Automatically handle errors on nodes. When an error happen it should try another node and move the node to some kind of quarantine zone.
The cluster should try to use nodes on quarantine for some time. After a number of errors a recovery process should be started, possibly spawning a new node and moving containers from one node to another.
The recovery process should be handled by the client implementing some kind of Recovery interface provided by the Cluster