/consensus_raft

The raft consensus component for CITA Cloud.

Primary LanguageRustApache License 2.0Apache-2.0

consensus_raft

Build Status

CITA-Cloudconsensus微服务的实现,基于raft-rs

编译docker镜像

docker build -t citacloud/consensus_raft .

使用方法

$ consensus -h
consensus 6.7.0
Rivtower Technologies <contact@rivtower.com>

Usage: consensus [COMMAND]

Commands:
  run   run the service
  help  Print this message or the help of the given subcommand(s)

Options:
  -h, --help     Print help
  -V, --version  Print version

consensus-run

运行consensus服务。

$ consensus run -h
consensus-run
run the service

USAGE:
    consensus run [OPTIONS]

OPTIONS:
    -c, --config <config>                  the consensus config [default: config.toml]
    -d, --log-dir <log-dir>                the log dir. Overrides the config
    -f, --log-file-name <log-file-name>    the log file name. Overrride the config
    -h, --help                             Print help information
        --stdout                           if specified, log to stdout. Overrides the config

参数:

  1. config 微服务配置文件。

    参见示例example/config.toml

    其中:

    • controller_port 为依赖的controller微服务的gRPC服务监听的端口号。
    • grpc_listen_port 为本微服务gRPC服务监听的端口号。
    • network_port 为依赖的network微服务的gRPC服务监听的端口号。
    • node_addr 为本节点地址文件路径。
  2. log-dir 日志的输出目录。

  3. log-file-name 日志输出的文件名。

  4. --stdout 不传该参数时,日志输出到文件;传递该参数时,日志输出到标准输出。

输出到日志文件:

$ consensus run -c example/config.toml -d . -f consensus.log
$ cat consensus.log
Mar 14 08:32:55.131 INFO controller grpc addr: http://127.0.0.1:50004, tag: controller, module: consensus::client:45
Mar 14 08:32:55.131 INFO network grpc addr: http://127.0.0.1:50000, tag: network, module: consensus::client:167
Mar 14 08:32:55.131 INFO registering network msg handler..., tag: network, module: consensus::client:191

输出到标准输出:

$ consensus run -c example/config.toml --stdout
Mar 14 08:34:00.124 INFO controller grpc addr: http://127.0.0.1:50004, tag: controller, module: consensus::client:45
Mar 14 08:34:00.125 INFO network grpc addr: http://127.0.0.1:50000, tag: network, module: consensus::client:167
Mar 14 08:34:00.125 INFO registering network msg handler..., tag: network, module: consensus::client:191

设计

Please check the ConsensusService and Consensus2ControllerService in cita_cloud_proto which defines the service that consensus should implement.

The main workflow for consensus service is as follow:

  1. Get proposal either from the local controller or from other remote consensus peers.
  2. If the proposal comes from peers, ask the local controller to check it first.
  3. Achieve consensus over the given proposal.
  4. Commit the proposal with its proof to the local controller.

The proof, for example, is the nonce for POW consensus, and is empty for non-byzantine consensus like this raft implementation. It will be used later by peers' controller to validate the corresponding block when they sync the missing blocks from others.

To communicate with other peers, you need to:

  1. Implement the NetworkMsgHandlerService which handles the messages from peers.
  2. Register your service to the network by RegisterNetworkMsgHandler, which tells the network to forward the messages you are concerned about.

After all of that, you can send your messages to others by SendMsg or Broadcast provided by the network service.

实现

raft-rs 提供了最核心的 Consensus Module,而其他的组件,包括 LogState MachineTransport,都是需要应用去定制实现。

  • Storage
    基于trait Storage实现RaftStorage

    • RaftStorage
    impl Storage for RaftStorage {
    fn initial_state(&self) -> raft::Result<RaftState> {
        Ok(self.initial_state())
    }
    
    fn first_index(&self) -> raft::Result<u64> {
        Ok(self.first_index())
    }
    
    fn last_index(&self) -> raft::Result<u64> {
        Ok(self.last_index())
    }
    
    fn term(&self, idx: u64) -> raft::Result<u64> {
        self.term(idx)
    }
    
    fn entries(
        &self,
        low: u64,
        high: u64,
        max_size: impl Into<Option<u64>>,
    ) -> raft::Result<Vec<Entry>> {
        self.entries(low, high, max_size)
    }
    
    fn snapshot(&self, request_index: u64) -> raft::Result<Snapshot> {
        self.snapshot(request_index)
    }
    

} ```

  • Log and State Machine

    raft的运行原理如下图所示:

    raft

    Raft的模型是一个基于Log复制的状态机模型。客户端向服务端Leader发起写入数据操作,Leader将该操作添加到Log并复制给所有Follower,当超过半数节点确认就可以将这条操作应用到State Machine中。

    通过Log复制的方式保证所有节点Log顺序一致,其目的是保证State Machine中数据状态的一致性。随着数据量的积累Log会不断增大,实际应用中会在适当时机对日志进行压缩,对当前State Machine的数据状态进行快照,将其作为应用数据的基础,并重新记录日志。一般的Raft应用中Log的数据轻,而State Machine的数据重,做快照的开销大,不宜频繁使用。 而本实现作为区块链系统中的共识模块,关注重点在于利用RaftConsensus ModuleState Machine的数据是ConsensusConfig,并非真正的区块链的状态,它是为Consensus Module的正常运行服务的,而Log的数据是Proposal,相比之下Log的数据过于沉重。充分利用这一实际应用特点和日志压缩的原理,这里的做法是:每个Proposal被应用之后都对State Machine的数据状态进行快照并本地保存,并不断清空已被应用Proposal,数据状态一致性(Log查询不到会用快照同步)和重启状态恢复(本地保存的快照)都通过快照来实现。

  • Transport

    该能力由network 实现

  • 启动及运行流程
    setup

    运行流程中的handle ready步骤按照raft-rs文档 实现