chaintope/tapyrus-signer

Signer process cause panic when tapyrus-core process was down.

Yamaguchi opened this issue · 5 comments

When I stopped tapyrus-core, The signer node (master) failed in panic.

[2020-02-13T01:18:11Z INFO  tapyrus_signer::signer_node] Start next round: self_index=2, master_index=2
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: JsonRpc(Hyper(Io(Os { code: 111, kind: ConnectionRefused, message: "Connection refused" })))', src/libcore/result.rs:1188:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

revision is a912eb7

This is current specification. When the rpc connection to tapyrus-core is failed, the node is going to be stop because if there is no connection, the node can't do anything.

However, I think we need to add re-try mechanism for momentary disconnection.

a6c12d4#diff-f9cd50535106be49873b810d27e0edb5R325-R332

After this commit, log of signer(on docker) is like:

signer3_1   | [2020-02-21T05:44:36Z INFO  tapyrus_signer::signer_node] Start next round: self_index=0, master_index=0
signer2_1   | [2020-02-21T05:44:46Z INFO  tapyrus_signer::signer_node] Start next round: self_index=2, master_index=2
signer1_1   | [2020-02-21T05:44:46Z INFO  tapyrus_signer::signer_node] Start next round: self_index=1, master_index=1
signer3_1   | [2020-02-21T05:45:36Z INFO  tapyrus_signer::signer_node] Broadcast candidate block. block hash for signing: Hash(b2b16f05f0ab72efc1b1acb8d6f2528276d40eefe925ab61f4dbe436b7feb703)
signer3_1   | [2020-02-21T05:45:37Z INFO  tapyrus_signer::signer_node::message_processor::process_candidateblock] candidateblock received. block hash for signing: Hash(b2b16f05f0ab72efc1b1acb8d6f2528276d40eefe925ab61f4dbe436b7feb703)
signer2_1   | [2020-02-21T05:45:46Z ERROR tapyrus_signer::signer_node] RPC getnewblock failed. reason=JsonRpc(Hyper(Io(Custom { kind: Other, error: "failed to lookup address information: Name or service not known" })))
signer2_1   | [2020-02-21T05:45:46Z INFO  tapyrus_signer::signer_node::message_processor::process_candidateblock] candidateblock received. block hash for signing: Hash(b2b16f05f0ab72efc1b1acb8d6f2528276d40eefe925ab61f4dbe436b7feb703)
signer2_1   | [2020-02-21T05:45:46Z WARN  tapyrus_signer::signer_node::message_processor::process_candidateblock] Received Invalid candidate block sender: 02a1c8965ed06987fa6d7e0f552db707065352283ab3c1471510b12a76a5905287
signer1_1   | [2020-02-21T05:45:46Z ERROR tapyrus_signer::signer_node] RPC getnewblock failed. reason=JsonRpc(Hyper(Io(Custom { kind: Other, error: "failed to lookup address information: Name or service not known" })))
signer1_1   | [2020-02-21T05:45:46Z INFO  tapyrus_signer::signer_node::message_processor::process_candidateblock] candidateblock received. block hash for signing: Hash(b2b16f05f0ab72efc1b1acb8d6f2528276d40eefe925ab61f4dbe436b7feb703)
signer1_1   | [2020-02-21T05:45:46Z WARN  tapyrus_signer::signer_node::message_processor::process_candidateblock] Received Invalid candidate block sender: 02a1c8965ed06987fa6d7e0f552db707065352283ab3c1471510b12a76a5905287
signer3_1   | [2020-02-21T05:45:46Z INFO  tapyrus_signer::signer_node] Start next round: self_index=0, master_index=1
signer1_1   | [2020-02-21T05:45:46Z ERROR tapyrus_signer::signer_node::message_processor::process_blockvss] Invalid blockvss message received. candidateblock was not received in this round yet, but got VSS.
signer2_1   | [2020-02-21T05:45:46Z ERROR tapyrus_signer::signer_node::message_processor::process_blockvss] Invalid blockvss message received. candidateblock was not received in this round yet, but got VSS.
signer1_1   | [2020-02-21T05:45:56Z INFO  tapyrus_signer::signer_node] Start next round: self_index=1, master_index=1
signer2_1   | [2020-02-21T05:45:56Z INFO  tapyrus_signer::signer_node] Start next round: self_index=2, master_index=2
signer2_1   | [2020-02-21T05:46:56Z ERROR tapyrus_signer::signer_node] RPC getnewblock failed. reason=JsonRpc(Hyper(Io(Custom { kind: Other, error: "failed to lookup address information: Name or service not known" })))
signer1_1   | [2020-02-21T05:46:56Z ERROR tapyrus_signer::signer_node] RPC getnewblock failed. reason=JsonRpc(Hyper(Io(Custom { kind: Other, error: "failed to lookup address information: Name or service not known" })))
signer3_1   | [2020-02-21T05:46:56Z INFO  tapyrus_signer::signer_node] Start next round: self_index=0, master_index=2
signer1_1   | [2020-02-21T05:47:06Z INFO  tapyrus_signer::signer_node] Start next round: self_index=1, master_index=1
signer2_1   | [2020-02-21T05:47:06Z INFO  tapyrus_signer::signer_node] Start next round: self_index=2, master_index=2
signer1_1   | [2020-02-21T05:48:06Z ERROR tapyrus_signer::signer_node] RPC getnewblock failed. reason=JsonRpc(Hyper(Io(Custom { kind: Other, error: "failed to lookup address information: Name or service not known" })))
signer2_1   | [2020-02-21T05:48:06Z ERROR tapyrus_signer::signer_node] RPC getnewblock failed. reason=JsonRpc(Hyper(Io(Custom { kind: Other, error: "failed to lookup address information: Name or service not known" })))
signer3_1   | [2020-02-21T05:48:06Z INFO  tapyrus_signer::signer_node] Start next round: self_index=0, master_index=0
signer1_1   | [2020-02-21T05:48:16Z INFO  tapyrus_signer::signer_node] Start next round: self_index=1, master_index=1
signer2_1   | [2020-02-21T05:48:16Z INFO  tapyrus_signer::signer_node] Start next round: self_index=2, master_index=2

In this case,

  • I have 3 tapyrus-core nodes(named tapyrus1, tapyrus2 tapyrus3)
  • signer1 connects tapyrus1, signer2 connects tapyrus2, signer3 connects tapyrus3
  • I start all nodes including tapyrus-core and tapyrus-signer.
  • then, i stopped processes in tapyrus1 and tapyrus2

All signer did not panic, but it seems that master_index have never been updated after the rounds timed out.

Fix the problem above, and signer node update its master_index even if getnewblock results in fail.

SignerNode#start_new_round() function returns Master State with DUMMY block when error occurred.

d796c5f#diff-f9cd50535106be49873b810d27e0edb5R329-R337

signer1_1   | [2020-02-21T06:56:10Z INFO  tapyrus_signer::signer_node] Start next round: self_index=1, master_index=0
signer2_1   | [2020-02-21T06:56:11Z INFO  tapyrus_signer::signer_node] Start next round: self_index=2, master_index=0
signer3_1   | [2020-02-21T06:56:11Z INFO  tapyrus_signer::signer_node] Start next round: self_index=0, master_index=0
signer3_1   | [2020-02-21T06:57:11Z INFO  tapyrus_signer::signer_node] Broadcast candidate block. block hash for signing: Hash(86aef006bedd4bfdd76b0fa19a72df99ada19537ca1b1efcc740afcc99f012df)
signer1_1   | [2020-02-21T06:57:11Z INFO  tapyrus_signer::signer_node::message_processor::process_candidateblock] candidateblock received. block hash for signing: Hash(86aef006bedd4bfdd76b0fa19a72df99ada19537ca1b1efcc740afcc99f012df)
signer1_1   | [2020-02-21T06:57:11Z WARN  tapyrus_signer::signer_node::message_processor::process_candidateblock] Received Invalid candidate block sender: 02a1c8965ed06987fa6d7e0f552db707065352283ab3c1471510b12a76a5905287
signer2_1   | [2020-02-21T06:57:12Z INFO  tapyrus_signer::signer_node::message_processor::process_candidateblock] candidateblock received. block hash for signing: Hash(86aef006bedd4bfdd76b0fa19a72df99ada19537ca1b1efcc740afcc99f012df)
signer2_1   | [2020-02-21T06:57:12Z WARN  tapyrus_signer::signer_node::message_processor::process_candidateblock] Received Invalid candidate block sender: 02a1c8965ed06987fa6d7e0f552db707065352283ab3c1471510b12a76a5905287
signer3_1   | [2020-02-21T06:57:12Z INFO  tapyrus_signer::signer_node::message_processor::process_candidateblock] candidateblock received. block hash for signing: Hash(86aef006bedd4bfdd76b0fa19a72df99ada19537ca1b1efcc740afcc99f012df)
signer1_1   | [2020-02-21T06:57:13Z ERROR tapyrus_signer::signer_node::message_processor::process_blockvss] Invalid blockvss message received. candidateblock was not received in this round yet, but got VSS.
signer2_1   | [2020-02-21T06:57:13Z ERROR tapyrus_signer::signer_node::message_processor::process_blockvss] Invalid blockvss message received. candidateblock was not received in this round yet, but got VSS.
signer1_1   | [2020-02-21T06:57:20Z INFO  tapyrus_signer::signer_node] Start next round: self_index=1, master_index=1
signer2_1   | [2020-02-21T06:57:21Z INFO  tapyrus_signer::signer_node] Start next round: self_index=2, master_index=1
signer3_1   | [2020-02-21T06:57:21Z INFO  tapyrus_signer::signer_node] Start next round: self_index=0, master_index=1
signer1_1   | [2020-02-21T06:58:20Z ERROR tapyrus_signer::signer_node] RPC getnewblock failed. reason=JsonRpc(Hyper(Io(Custom { kind: Other, error: "failed to lookup address information: Name or service not known" })))
signer2_1   | [2020-02-21T06:58:31Z INFO  tapyrus_signer::signer_node] Start next round: self_index=2, master_index=2
signer1_1   | [2020-02-21T06:58:31Z INFO  tapyrus_signer::signer_node] Start next round: self_index=1, master_index=2
signer3_1   | [2020-02-21T06:58:31Z INFO  tapyrus_signer::signer_node] Start next round: self_index=0, master_index=2
signer2_1   | [2020-02-21T06:59:31Z ERROR tapyrus_signer::signer_node] RPC getnewblock failed. reason=JsonRpc(Hyper(Io(Custom { kind: Other, error: "failed to lookup address information: Name or service not known" })))
signer2_1   | [2020-02-21T06:59:41Z INFO  tapyrus_signer::signer_node] Start next round: self_index=2, master_index=0
signer1_1   | [2020-02-21T06:59:41Z INFO  tapyrus_signer::signer_node] Start next round: self_index=1, master_index=0
signer3_1   | [2020-02-21T06:59:41Z INFO  tapyrus_signer::signer_node] Start next round: self_index=0, master_index=0

This issue has already solved at #50