/openraft

rust raft with improvements

Primary LanguageRustApache License 2.0Apache-2.0

Openraft

Advanced Raft using the Tokio framework. Please ⭐ on github!

CI Crates.io docs.rs License Crates.io Crates.io

Raft is not yet good enough. This project intends to improve raft as the next generation consensus protocol for distributed data storage systems (SQL, NoSQL, KV, Streaming, Graph ... or maybe something more exotic).

Currently openraft is the consensus engine of meta-service cluster in databend.

  • Get started: The guide is the best place to get started, followed by the docs for more in-depth details.

  • Openraft API is not stable yet. Before 1.0.0, an upgrade may contain incompatible changes. Check our change-log

  • Openraft is derived from async-raft with several bugs fixed:

List of fixed bugs:
  • Fixed: 6c0ccaf3 consider joint config when starting up and committing.; by drdr xp; 2021-12-24
  • Fixed: 228077a6 a restarted follower should not wait too long to elect. Otherwise, the entire cluster hangs; by drdr xp; 2021-11-19
  • Fixed: a48a3282 handle-vote should compare last_log_id in dictionary order, not in vector order; by drdr xp; 2021-09-09
  • Fixed: eed681d5 race condition of concurrent snapshot-install and apply.; by drdr xp; 2021-09-01
  • Fixed: 9540c904 when append-entries, deleting entries after prev-log-id causes committed entry to be lost; by drdr xp; 2021-08-31
  • Fixed: 6d53aa12 too many(50) inconsistent log should not live lock append-entries; by drdr xp; 2021-08-31
  • Fixed: 4d58a51e a non-voter not in joint config should not block replication; by drdr xp; 2021-08-31
  • Fixed: 8cd24ba0 RaftCore.entries_cache is inconsistent with storage. removed it.; by drdr xp; 2021-08-23
  • Fixed: 2eccb9e1 install snapshot req with offset GE 0 should not start a new session.; by drdr xp; 2021-08-22
  • Fixed: eee8e534 snapshot replication does not need to send a last 0 size chunk; by drdr xp; 2021-08-22
  • Fixed: beb0302b leader should not commit when there is no replication to voters.; by drdr xp; 2021-08-18
  • Fixed: dba24036 after 2 log compaction, membership should be able to be extract from prev compaction log; by drdr xp; 2021-07-14
  • Fixed: 447dc11c when finalize_snapshot_installation, memstore should not load membership from its old log that are going to be overridden by snapshot.; by drdr xp; 2021-07-13
  • Fixed: cf4badd0 leader should re-create and send snapshot when threshold/2 < last_log_index - snapshot < threshold; by drdr xp; 2021-07-08
  • Fixed: d60f1e85 client_read has using wrong quorum=majority-1; by drdr xp; 2021-07-02
  • Fixed: 11cb5453 doc-include can only be used in nightly build; by drdr xp; 2021-06-16
  • Fixed: a10d9906 when handle_update_match_index(), non-voter should also be considered, because when member change a non-voter is also count as a quorum member; by drdr xp; 2021-06-16
  • Fixed: d882e743 when calc quorum, the non-voter should be count; by drdr xp; 2021-06-02
  • Fixed: 6202138f a conflict is expected even when appending empty entries; by drdr xp; 2021-05-24
  • Fixed: f449b64a discarded log in replication_buffer should be finally sent.; by drdr xp; 2021-05-22
  • Fixed: 6d680484 #112 : when a follower is removed, leader should stop sending log to it.; by drdr xp; 2021-05-21
  • Fixed: 89bb48f8 last_applied should be updated only when logs actually applied.; by drdr xp; 2021-05-20
  • Fixed: 39690593 a NonVoter should stay as NonVoter instead of Follower after restart; by drdr xp; 2021-05-14

A full list of changes/fixes can be found in change-log

Roadmap

  • Extended joint membership

  • Reduce the complexity of vote and pre-vote: get rid of pre-vote RPC;

  • Reduce confliction rate when electing; Allow leadership to be taken in one term by a node with greater node-id.

  • Support flexible quorum, e.g.:Hierarchical Quorums

  • Consider introducing read-quorum and write-quorum, improve efficiency with a cluster with an even number of nodes.

Features

  • It is fully reactive and embraces the async ecosystem. It is driven by actual Raft events taking place in the system as opposed to being driven by a tick operation. Batching of messages during replication is still used whenever possible for maximum throughput.

  • Storage and network integration is well defined via two traits RaftStorage & RaftNetwork. This provides applications maximum flexibility in being able to choose their storage and networking mediums.

  • All interaction with the Raft node is well defined via a single public Raft type, which is used to spawn the Raft async task, and to interact with that task. The API for this system is clear and concise.

  • Log replication is fully pipelined and batched for optimal performance. Log replication also uses a congestion control mechanism to help keep nodes up-to-date as efficiently as possible.

  • It fully supports dynamic cluster membership changes with joint config. The buggy single-step membership change algo is not considered. See the dynamic membership chapter in the guide.

  • Details on initial cluster formation, and how to effectively do so from an application's perspective, are discussed in the cluster formation chapter in the guide.

  • Automatic log compaction with snapshots, as well as snapshot streaming from the leader node to follower nodes is fully supported and configurable.

  • The entire code base is instrumented with tracing. This can be used for standard logging, or for distributed tracing, and the verbosity can be statically configured at compile time to completely remove all instrumentation below the configured level.

Contributing

Check out the CONTRIBUTING.md guide for more details on getting started with contributing to this project.

License

Openraft is licensed under the terms of the MIT License or the Apache License 2.0, at your choosing.