/elasticsearch-formal-models

TLA+ Model of the Elasticsearch data replication approach

Primary LanguageIsabelleApache License 2.0Apache-2.0

Formal models of core Elasticsearch algorithms

This repository contains formal models of core Elasticsearch algorithms and is directly related to implementation efforts around data replication and cluster coordination. The models in this repository might represent past, current and future designs of Elasticsearch and can differ to their implementations in substantial ways. The formal models mainly serve to illustrate some of the high-level concepts and help to validate resiliency-related aspects.

Models

Cluster coordination model

The cluster coordination TLA+ model ensures the consistency of cluster state updates and represents the core cluster coordination and metadata replication algorithm implemented in Elasticsearch 7.0. It consists of two files:

Data replication model

The data replication TLA+ model describes the Elasticsearch sequence number based data replication approach, implemented since Elasticsearch 6.0, which consists of two files:

Replica engine

A TLA+ model of how the engine handles replication requests.

Alternative cluster coordination model

The alternative cluster coordination TLA+ model consists of two files:

The alternative cluster consensus Isabelle model consists of the following theories:

How to edit/run TLA+:

How to run the model checker in headless mode:

  • Download tla2tools.jar
  • Run the model checker once in TLA+ Toolbox on desktop (can be aborted once started). This generates the folder elasticsearch.toolbox/model/ that contains all model files that are required to run the model checker in headless mode.
  • Copy the above folder and tla2tools.jar to the server running in headless mode.
  • cd to the folder and run java -Xmx30G -cp ../tla2tools.jar tlc2.TLC MC -deadlock -workers 12. The setting -Xmx30G denotes the amount of memory to allocate to the model checker and -workers 12 the number of worker threads (should be equal to the number of cores on machine). The setting -deadlock ensures that TLC explores the full reachable state space, not searching for deadlocks.