helium/blockchain-core

regulatory_regions_not_set error case not handled

ci-work opened this issue · 7 comments

when replaying ledger around block 1083100 +/- a few hundred, etl+node call blockchain_txn_poc_receipts_v1:to_json which calls tagged_path_elements_fold which calls get_channels/2, which returns error regulatory_regions_not_set, which is not handled in the catch of tagged_path_elements_fold

@ci-work Do you have the crash handy in the logs from around the time it happened? and what follower project it was in (node/etl)?

was in blockchain-node, and sorry logs are long gone, in blockchain-node it's actually masked by the try catch, then the catch calls to_json again without without specifying the chain+ledger and it processes without error, I presume because my chain is far ahead of ledger height so the chain variable is set (on second call without chain+ledger being passed in), if sync naturally without a higher blockchain.db dump then it would fail I guess?

The error isn't present earlier, and clears up later in the chain, and I guess since only I have encountered it, and only noticed it by adding in extra logging then removing the try/catch to get the cause, it's likely not a big issue, however others may run in to this down the line, especially if more people try to go the blockchain-node route rather than etl as chain size increases.

so this error regulatory_regions_not_set happens until 1091692

the region vars were set on chain at height: 1091693

at next epoch block, 1091703, absorb starts to drift away from transactions being saved, by the time the 2 transactions are saved it's +31 ahead:

@blockchain_txn:unvalidated_absorb_and_commit:502 validation took 0 absorb took 413 post took 226 ms height 1091734
@bn_txns:load_block:92 Saved 2 transactions at height 1091703

by the time it's at 1091777, getting error:

2021-12-20 06:03:11.445 [error] <0.1188.0>@blockchain_txn_rewards_v2:calculate_rewards_metadata:346 Caught error; couldn't calculate rewards metadata because: badarg
[{rocksdb,iterator,[#Ref<0.305294975.3863085096.141869>,#Ref<0.305294975.3863085096.141876>,[{snapshot,#Ref<0.305294975.3866755077.7954>}]],[]},
{blockchain_ledger_v1,rocks_fold,6,[{file,"/var/sites/blockchain-node/_build/default/lib/blockchain/src/ledger/v1/blockchain_ledger_v1.erl"},{line,3830}]},
{blockchain_ledger_v1,cache_fold,5,[{file,"/var/sites/blockchain-node/_build/default/lib/blockchain/src/ledger/v1/blockchain_ledger_v1.erl"},{line,3823}]},
{blockchain_txn_rewards_v2,securities_rewards,2,[{file,"/var/sites/blockchain-node/_build/default/lib/blockchain/src/transactions/v2/blockchain_txn_rewards_v2.erl"},{line,852}]},
{blockchain_txn_rewards_v2,finalize_reward_calculations,3,[{file,"/var/sites/blockchain-node/_build/default/lib/blockchain/src/transactions/v2/blockchain_txn_rewards_v2.erl"},{line,538}]},
{blockchain_txn_rewards_v2,calculate_rewards_metadata,3,[{file,"/var/sites/blockchain-node/_build/default/lib/blockchain/src/transactions/v2/blockchain_txn_rewards_v2.erl"},{line,338}]},
{bn_txns,to_json,3,[{file,"/var/sites/blockchain-node/src/bn_txns.erl"},{line,249}]},
{bn_txns,'-save_transactions/5-fun-0-',8,[{file,"/var/sites/blockchain-node/src/bn_txns.erl"},{line,198}]}]

this error also occurs at blocks: 1091811, 1092233, 1092299, 1092428, 1092407, 1092445, 1092477, 1092542, 1092626, 1092638, 1092708, 1092807, 1093057, 1093127, 1093249, 1093302, 1093314, 1093473, 1093526, 1093603, 1093641, 1093677, 1093714, 1093825, 1093859, 1093984, 1094035, 1094143, 1094189, 1094227, 1094326, 1094393, 1094435, 1094775, 1094818, 1094857, 1094891, 1094934, 1095146, 1095294, 1095387, 1095714, 1095864, 1095982, 1096225

intermittent epoch/reward blocks start throwing height_too_old from 1092265, then all epoch blocks after 1096225 are height_too_old, which prevents calculate_rewards_metadata (in blockchain-node) from running, therefore no detailed rewards are saved.

my gut says (lol) that both issues are caused by the chain absorb to ledger drifting ahead from block 1091693 when region vars were activated on chain, but I do not know why this is the case

height_too_old is because the the region_vars error happens. Since node is async the ledger in core keeps moving ahead and after about 50 blocks you get to the height_too_old messages

why is it async when it has a requirement to be blocking to do calculate_rewards_metadata? any (easy) way to make it not async?

I'll get this fixed by actually fixing the bug instead of working around it

Thanks @madninja folded in and replaying ledger from 1070600 now.