Finschia/ostracon

The later nodes got panic when a large number of them are started

torao opened this issue · 0 comments

torao commented

Tendermint version (use tendermint version or git rev-parse --verify HEAD if installed from source):
0.33.6-0.3-17077261

Environment:

  • OS (e.g. from /etc/os-release): CentOS 7
  • Install tools:
  • Others:

What happened:

When I started the BLS signature aggregation version with 100 nodes for performance testing, the following error occurred on the nodes after 70-80 in the startup order and the process aborted.

I[2021-02-10|08:45:36.399] Version info                                 module=main software=0.33.6 block=10 p2p=7
panic: Aggregated commit cannot make a VoteSet

goroutine 1 [running]:
github.com/tendermint/tendermint/types.CommitToVoteSet(0xc0012f2300, 0x11, 0xc00012af00, 0xc00047eca0, 0x0)
	github.com/tendermint/tendermint/types/block.go:765 +0x4d9
github.com/tendermint/tendermint/consensus.(*State).reconstructLastCommit(0xc0004d1600, 0xa, 0x0, 0xc0012df5c0, 0x6, 0xc0012f2300, 0x11, 0xc0012df5c8, 0x2, 0xc0012f2320, ...)
	github.com/tendermint/tendermint/consensus/state.go:543 +0x8b
github.com/tendermint/tendermint/consensus.NewState(0xc0001442d0, 0xa, 0x0, 0xc0012df5c0, 0x6, 0xc0012f2300, 0x11, 0xc0012df5c8, 0x2, 0xc0012f2320, ...)
	github.com/tendermint/tendermint/consensus/state.go:222 +0x51c
github.com/tendermint/tendermint/node.createConsensusReactor(0xc00013a160, 0xa, 0x0, 0xc0012df5c0, 0x6, 0xc0012f2300, 0x11, 0xc0012df5c8, 0x2, 0xc0012f2320, ...)
	github.com/tendermint/tendermint/node/node.go:383 +0x19b
github.com/tendermint/tendermint/node.NewNode(0xc00013a160, 0x13dfaa0, 0xc0002483c0, 0xc001311de0, 0x13c19a0, 0xc001315260, 0xc00131a080, 0x127b4f0, 0xc00131a090, 0x13df960, ...)
	github.com/tendermint/tendermint/node/node.go:658 +0xa13
github.com/tendermint/tendermint/node.DefaultNewNode(0xc00013a160, 0x13df960, 0xc001314b60, 0xc000277c58, 0xdb6bdd, 0xc000117080)
	github.com/tendermint/tendermint/node/node.go:102 +0x544
github.com/tendermint/tendermint/cmd/tendermint/commands.NewRunNodeCmd.func1(0xc000117080, 0xc000086840, 0x0, 0x1, 0x0, 0x0)
	github.com/tendermint/tendermint/cmd/tendermint/commands/run_node.go:106 +0x7a
github.com/spf13/cobra.(*Command).execute(0xc000117080, 0xc000086830, 0x1, 0x1, 0xc000117080, 0xc000086830)
	github.com/spf13/cobra@v1.0.0/command.go:842 +0x453
github.com/spf13/cobra.(*Command).ExecuteC(0x1b51e20, 0x2, 0xc000019100, 0x1121f4b)
	github.com/spf13/cobra@v1.0.0/command.go:950 +0x349
github.com/spf13/cobra.(*Command).Execute(...)
	github.com/spf13/cobra@v1.0.0/command.go:887
github.com/tendermint/tendermint/libs/cli.Executor.Execute(0x1b51e20, 0x127cf68, 0x2, 0xc00002f040)
	github.com/tendermint/tendermint/libs/cli/setup.go:89 +0x3c
main.main()
	github.com/tendermint/tendermint/cmd/tendermint/main.go:48 +0x2f5

This is reproducible in my environment and seems to occur on nodes that were later attempted to be started, but it is uncertain how many nodes will fail to start.