A new cluster coordination layer

Question

A new cluster coordination layer

ywelsch opened this issue 6 years ago · 2 comments

The cluster state contains important metadata about the cluster, including what the mappings look like, what settings the indices have, which shards are allocated to which nodes, etc. Inconsistencies in the cluster state can have the most horrid consequences including inconsistent search results and data loss, and the job of the cluster state coordination subsystem is to prevent any such inconsistencies. Ideally this subsystem should also be easy to configure correctly and it should perform well in a variety of situations.

The goal of this project is to rebuild the cluster state coordination subsystem, making it more reliable, performant and user-friendly. Better reliability will be achieved by basing the core algorithm on strong theoretical underpinnings and extensive testing. Misconfiguration of the minimum_master_nodes setting, one of the most common causes for cluster state inconsistencies, will be addressed by having this property fully managed by the system itself.

We've built a prototype to validate the approach and, based on our experience with this, present the following development roadmap for this new cluster coordination and consensus layer, targeting ES 7.0:

After 7.0 FF:

Deprecate any Zen1-specific settings and rename any others that mention zen but which are still in use. (#38289,#38333,#38350)
Make discovery.type non-configurable/internal-only / move Zen1 to tests only (#39466)
Scaling tests (e.g. election clashes when having large cluster states)
Do not close bad indices on state recovery (#39500)
Add stats (e.g. expose stuff like node term, or discovery information while the node has troubles forming / joining a cluster) (#35993)
Contemplate timeouts, retries, etc. and consider improvements to default values (#38298)
Check logged messages are useful and at the appropriate levels (#39756, #39950).
Docs 📜 (#34714, #36959, #36954, #36942, #36909) also docs for full-cluster and rolling upgrades

Post 7.0:

Smoother master failovers by not exposing those to the ClusterApplierService, i.e., delay putting up a NO_MASTER_BLOCK.
Abdicate on leader shutdown (appoint new leader)
Add "has_voting_exclusions" flag to cluster health output (#38568)
Enqueueing cluster state updates to behave as well as possible in an overloaded cluster.
Verify that a master which cannot write its cluster state stands down (or maybe actively abdicates)
Deal appropriately with duplicate nodes (see e.g. #32904)
High-level rest client integration for new APIs
Avoid bootstrapping if any discovered peer has a nonzero term
Work with support to enhance cluster diagnostics analysis tool.

Answer 1 · 2018-07-12T13:29:35.000Z

Pinging @elastic/es-distributed

Answer 2 · 2019-04-24T14:30:50.000Z

Closing this one as shipped in 7.0. Possible follow-ups will be tracked separately.