Documentation of best practices for using s3 and automatic instance management
Closed this issue · 4 comments
Bootstrapping three nodes with exhibitor using s3 for automatic instance management seems to have the odd property of all three nodes in the cluster restarting one another for roughly twenty minutes before finally settling into a healthy state. I have played with the timeouts available in the configuration extensively and I haven't really found a configuration that speeds things up. At this point I'm still faced with a fairly long zookeeper setup bootstrap phase and I'm considering just giving these nodes static ips and passing these in via the servers-spec= property.
Is this the expected behavior when using s3 with automatic instance management? Anyone have any hints as to how to get three zookeeper nodes to properly bootstrap, find one another, and settle into a healthy state more swiftly than twenty minutes of seemingly unnecessary rolling restarts? Should I be passing in the initial servers-spec= property so it has a correct initial state?
Many thanks.
https://github.com/Banno/docker-zk-exhibitor/blob/master/include/wrapper.sh
When I was at Netflix, it would take 10-12 minutes to get a 5 node cluster fully reconfigured. So, 20 minutes seems very long. Just as I was leaving Netflix I came to the conclusion that rolling config changes are inherently unsafe (see http://qnalist.com/questions/3983279/rolling-config-change-considered-harmful). My suggestion is to set "Apply All At Once" to true when using automatic server management. This will create a single event that is more disruptive, but it's just the one - instead of lots of little event.s
That's a great read @Randgalt.
My bootstrap doesn't take 20 mins, more like 10. Here is my config if you're interested.
You might want to increase the settling period. Also, try setting a fixed ensemble size.
All of this information is much appreciated. I've got some good leads to go on now. @stonefury thanks for the full config!