Node cannot sync from scratch and crashes silenty
Closed this issue · 9 comments
Starting at release 1.1.41 and on, blockchain-node is not able to sync from scratch (tested on MacOS Big Sur on MacBook Pro).
After ingesting the snapshot, finding peers, and adding sync blocks, the this is the last of the logs:
2021-10-25 12:25:49.122 [info] <0.1467.0>@blockchain_sync_handler:handle_data:127 adding sync blocks [1042561,1042562,1042563,1042564,1042565]
2021-10-25 12:25:49.516 [info] <0.1553.0>@blockchain_ledger_v1:has_snapshot:741 loading checkpoint from disk with ledger mode active
2021-10-25 12:25:49.517 [info] <0.1555.0>@blockchain_ledger_v1:has_snapshot:741 loading checkpoint from disk with ledger mode active
The last line repeats a few more times before node exits with no error or crash message.
The only way I found out that this was happening was by running make stop
and having the console error out and tell me that the node was already stopped.
For reference, release 1.1.40 does not have this issue, indicating that it's probably something introduced from core.
@syuan100 what snapshot is your node on? i.e. what's the lowest block height it has?
Initial snapshot loading starts at 1042510:
2021-10-25 11:54:30.373 [info] <0.1248.0>@blockchain_worker:do_s3_download:1115 Attempting snapshot download from "https://snapshots.helium.wtf/mainnet/snap-1042561", writing to scratch file "data/snap/snap-1042561.scratch"
...
2021-10-25 11:54:36.579 [info] <0.1248.0>@blockchain_worker:do_s3_download:1119 snap written to scratch file "data/snap/snap-1042561.scratch"
2021-10-25 11:54:36.579 [info] <0.1248.0>@blockchain_worker:start_snapshot_sync:1022 Successfully saved snap to disk in "data/snap/snap-1042561"
2021-10-25 11:54:37.116 [info] <0.1248.0>@blockchain_worker:attempt_load_snapshot_from_disk:1147 Stored snap 1042510 - attempting install
2021-10-25 11:54:37.116 [info] <0.1153.0>@blockchain_worker:handle_call:413 installing snapshot <<212,78,208,77,149,141,114,253,30,95,233,18,128,246,230,144,176,72,162,59,180,164,195,18,72,169,72,153,183,114,159,34>>
2021-10-25 11:54:45.042 [info] <0.1153.0>@blockchain_ledger_snapshot_v1:load_blocks:618 ledger height is 1042510 before absorbing snapshot
2021-10-25 11:54:45.043 [info] <0.1153.0>@blockchain_ledger_snapshot_v1:load_blocks:619 snapshot contains 66 blocks
...
Last loaded and saved block is 1042560:
2021-10-25 11:55:17.235 [info] <0.1153.0>@blockchain_ledger_snapshot_v1:load_blocks:656 saving block 1042560
2021-10-25 11:55:17.255 [info] <0.1153.0>@blockchain_ledger_snapshot_v1:load_blocks:663 loading block 1042560
2021-10-25 11:55:18.081 [info] <0.1153.0>@blockchain_txn_state_channel_close_v1:absorb:348 Closing with conflict false
2021-10-25 11:55:18.137 [info] <0.1153.0>@blockchain_ledger_v1:maybe_gc_scs:2101 gcing old state_channels...
2021-10-25 11:55:18.353 [info] <0.1153.0>@blockchain_ledger_snapshot_v1:import:497 ledger height is 1042560 after absorbing blocks
@syuan100 could you try building with v1.1.42
? There's a change in blockchain-core that might help with this issue. We're not totally sure, but it would be great to see if it improves things.
Hi @ke6jjj I have tried with 1.1.42 and it still silently crashes.
Ok, thanks Steven. This still helps.
I also experience this with 1.1.41-1.1.42. It silently crashes and the only indication is if you try to make stop
or call the bin directory to do anything (ie peer, repair, info
commands)
I've tried Erlang 22.3.1 and 23.0 since my AMI on AWS had 23.0 installed with asdf. OTP 22.3.1 gets further than 23.0 but still silently crashes.
System Information:
20.04.2-Ubuntu SMP Fri Oct 1 13:03:59 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
helium/blockchain-core#1076 is a probably fix for this.