moonbeam-foundation/moonbeam

Moonbase node crashes every 10 min

Opened this issue · 4 comments

I'm running a Moonbase Alpha node currently version 0.33.0
The node crashes every 10 min and works fine after a restart. This problem has occurred since August at least so not isolated to 0.33.0.
See log for more details.
moonbase_crash.log

Arguments used
--chain=alphanet --state-pruning=archive --rpc-max-connections=1000 --execution=wasm --wasm-execution=compiled --rpc-external --rpc-port=9933 --rpc-cors=all --rpc-methods=unsafe --prometheus-external --name="\U0001F6E1 DWELLIR MOONBASE ALPHA RPC 1 \U0001F6E1" --wasm-runtime-overrides=/home/polkadot/wasm --runtime-cache-size=16 --max-runtime-instances=32 -- --execution=wasm --bootnodes=/dns/0.westend.paritytech.net/tcp/30333/p2p/12D3KooWKer94o1REDPtAhjtYR4SdLehnSrN8PEhBnZm5NBoCrMC --bootnodes=/dns/westend.bootnode.amforc.com/tcp/30333/p2p/12D3KooWJ5y9ZgVepBQNW4aabrxgmnrApdVnscqgKWiUu4BNJbC8

Sep 27 07:21:36 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:36 [🌗] 💤 Idle (1 peers), best: #5186945 (0xd139…b580), finalized #5186944 (0x1f86…e42a), ⬇ 0.2kiB/s ⬆ 0.2kiB/s
Sep 27 07:21:36 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:36 Accepting new connection 1/1000
Sep 27 07:21:36 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:36 HTTP serve connection failed hyper::Error(Shutdown, Os { code: 107, kind: NotConnected, message: "Transport endpoint is not connected" })
Sep 27 07:21:37 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:37 Accepting new connection 1/1000
Sep 27 07:21:37 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:37 HTTP serve connection failed hyper::Error(Shutdown, Os { code: 107, kind: NotConnected, message: "Transport endpoint is not connected" })
Sep 27 07:21:38 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:38 Accepting new connection 1/1000
Sep 27 07:21:38 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:38 HTTP serve connection failed hyper::Error(Shutdown, Os { code: 107, kind: NotConnected, message: "Transport endpoint is not connected" })
Sep 27 07:21:39 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:39 Accepting new connection 1/1000
Sep 27 07:21:41 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:41 [Relaychain] 💤 Idle (8 peers), best: #12248368 (0x20e5…ae7c), finalized #12248365 (0x0629…5836), ⬇ 2.7kiB/s ⬆ 1.7kiB/s
Sep 27 07:21:41 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:41 [🌗] 💤 Idle (2 peers), best: #5186945 (0xd139…b580), finalized #5186944 (0x1f86…e42a), ⬇ 0.6kiB/s ⬆ 0.5kiB/s
Sep 27 07:21:42 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:42 [Relaychain] ✨ Imported #12248369 (0x2556…6c25)
Sep 27 07:21:42 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:42 [Relaychain] ♻️  Reorg on #12248369,0x2556…6c25 to #12248369,0xe211…c557, common ancestor #12248368,0x20e5…ae7c
Sep 27 07:21:42 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:42 [Relaychain] ✨ Imported #12248369 (0xe211…c557)
Sep 27 07:21:42 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:42 [Relaychain] subsystem exited with error subsystem="availability-distribution-subsystem" err=FromOrigin { origin: "availability-distribution", source: IncomingMessageChannel(Generated(Context("Signal channel is terminated and empty."))) }
Sep 27 07:21:42 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:42 [Relaychain] subsystem exited with error subsystem="candidate-validation-subsystem" err=FromOrigin { origin: "candidate-validation", source: Generated(Context("Signal channel is terminated and empty.")) }
Sep 27 07:21:42 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:42 [Relaychain] subsystem exited with error subsystem="statement-distribution-subsystem" err=FromOrigin { origin: "statement-distribution", source: SubsystemReceive(Generated(Context("Signal channel is terminated and empty."))) }
Sep 27 07:21:42 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:42 [Relaychain] Overseer exited with error err=Generated(SubsystemStalled("availability-store-subsystem"))
Sep 27 07:21:42 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:42 [Relaychain] subsystem exited with error subsystem="bitfield-signing-subsystem" err=FromOrigin { origin: "bitfield-signing", source: Generated(Context("Signal channel is terminated and empty.")) }
Sep 27 07:21:42 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:42 [Relaychain] Essential task `overseer` failed. Shutting down service.
Sep 27 07:21:42 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:42 [Relaychain] error receiving message from subsystem context: Generated(Context("Signal channel is terminated and empty.")) err=Generated(Context("Signal channel is terminated and empty."))
Sep 27 07:21:42 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:42 [Relaychain] Failed to receive a message from Overseer, exiting err=Generated(Context("Signal channel is terminated and empty."))
Sep 27 07:21:42 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:42 [Relaychain] subsystem exited with error subsystem="network-bridge-tx-subsystem" err=FromOrigin { origin: "network-bridge", source: SubsystemError(Generated(Context("Signal channel is terminated and empty."))) }
Sep 27 07:21:42 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:42 [Relaychain] subsystem exited with error subsystem="dispute-distribution-subsystem" err=FromOrigin { origin: "dispute-distribution", source: SubsystemReceive(Generated(Context("Signal channel is terminated and empty."))) }
Sep 27 07:21:42 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:42 [Relaychain] subsystem exited with error subsystem="chain-api-subsystem" err=FromOrigin { origin: "chain-api", source: Generated(Context("Signal channel is terminated and empty.")) }
Sep 27 07:21:42 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:42 [Relaychain] subsystem exited with error subsystem="availability-recovery-subsystem" err=FromOrigin { origin: "availability-recovery", source: Generated(Context("Signal channel is terminated and empty.")) }
Sep 27 07:21:42 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:42 [Relaychain] subsystem exited with error subsystem="approval-voting-subsystem" err=FromOrigin { origin: "approval-voting", source: Generated(Context("Signal channel is terminated and empty.")) }
Sep 27 07:21:42 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:42 [Relaychain] subsystem exited with error subsystem="dispute-coordinator-subsystem" err=FromOrigin { origin: "dispute-coordinator", source: SubsystemReceive(Generated(Context("Signal channel is terminated and empty."))) }
Sep 27 07:21:42 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:42 [Relaychain] err=Subsystem(Generated(Context("Signal channel is terminated and empty.")))
Sep 27 07:21:42 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:42 [Relaychain] subsystem exited with error subsystem="network-bridge-rx-subsystem" err=FromOrigin { origin: "network-bridge", source: SubsystemError(Generated(Context("Signal channel is terminated and empty."))) }
Sep 27 07:21:42 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:42 [Relaychain] subsystem exited with error subsystem="provisioner-subsystem" err=FromOrigin { origin: "provisioner", source: OverseerExited(Generated(Context("Signal channel is terminated and empty."))) }
Sep 27 07:21:42 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:42 [Relaychain] subsystem exited with error subsystem="candidate-backing-subsystem" err=FromOrigin { origin: "candidate-backing", source: OverseerExited(Generated(Context("Signal channel is terminated and empty."))) }
Sep 27 07:21:42 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:42 [Relaychain] subsystem exited with error subsystem="runtime-api-subsystem" err=Generated(Context("Signal channel is terminated and empty."))
Sep 27 07:22:42 juju-1b6dd3-0 polkadot[1442906]: Error: Service(Other("Essential task failed."))

@bkchr I thought this was fixed already (This is using polkadot v0.9.43)

Good question. Should have been? I don't remember 🙈

I reported it there, let's see:
paritytech/polkadot-sdk#1730

I deployed a new node which is fully synced and in use now which doesn't have this problem.
So this is not a problem that affects our services anymore.

I still have the old node if you want me to try something with it.