hyperledger/besu

Besu failing to index event log

ajsutton opened this issue · 9 comments

Description

On one of the EF teku-besu kiln nodes, the besu instance is failing to return one of the deposit event logs. Since the CL requires all deposit events in order Teku is complaining about this quite a lot.

This is on box 164.92.203.174

JSON RPC request is:

{"jsonrpc":"2.0","method":"eth_getLogs","params":[{"topics":["0x649bbc62d0e31342afea4e5cd82d4049e7e1ee912fc0889aa790803be39038c5"],"fromBlock":"0x8547d","toBlock":"0x8547d","address":["0x4242424242424242424242424242424242424242"]}],"id":0}

The broken besu instance returns:

{
  "jsonrpc" : "2.0",
  "id" : 0,
  "result" : [ ]
}

But another teku-besu instance on kiln returns the deposit event correctly:

{
  "jsonrpc" : "2.0",
  "id" : 0,
  "result" : [ {
    "logIndex" : "0x0",
    "removed" : false,
    "blockNumber" : "0x8547d",
    "blockHash" : "0xa62e2b2f55447ecbf5b603fa032746222da6039f27851175a3214812029ebc70",
    "transactionHash" : "0x589c0803aef2b3d0ecd48888bd0f9d799fd2421860c46bf91e2e21f0eb82414d",
    "transactionIndex" : "0x3c",
    "address" : "0x4242424242424242424242424242424242424242",
    "data" : "0x00000000000000000000000000000000000000000000000000000000000000a000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000140000000000000000000000000000000000000000000000000000000000000018000000000000000000000000000000000000000000000000000000000000002000000000000000000000000000000000000000000000000000000000000000030933f02fc10608e5b41513035b9f8f08306a0c54fcb2b04a673881987e6eb089b983481f5d888d4f6e08ebffbb41fcc4f000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000020005db87e87315c6545e51a5daff02eab6df948cc8c9a49d6d7d03b80d1f18096000000000000000000000000000000000000000000000000000000000000000800405973070000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000060aa19ef95787fe491388337f47fdb67936ae0aeb437766b5cc44402c8ecd08d51fe7cca7643d85fca8ec0a1c16f5102210e4b41bb6ef2f84964bdfeb733121d046e0d5cd7eb85ce8e399d0d4e61cc0cb979f58717e925e0c06a734f0311a772a00000000000000000000000000000000000000000000000000000000000000008442d000000000000000000000000000000000000000000000000000000000000",
    "topics" : [ "0x649bbc62d0e31342afea4e5cd82d4049e7e1ee912fc0889aa790803be39038c5" ]
  } ]
}

Somehow besu has failed to index this deposit event.

Had a second instance of this on ropsten (post merge). Resolved by stopping besu, deleting all the caches/*.cache files and restarting besu again. So besu had the deposit log fine but created incorrect log bloom cache files.

I finally have evidence of this leading to a failed block proposal - presumably because the block would have contained deposits?
https://ropsten.beaconcha.in/block/260588
Validator successfully attested during the epoch, but failed to propose a block.

Prysm shows

ERROR powchain: Unable to process past deposit contract logs: Could not process deposit log: received incorrect merkle index: wanted 2974 but got 2980

edit Confirmed that rm caches/*.cache resolved the issue.

jflo commented

I noticed in the Teku case it is calling with the same from and to values, indicating a single block. I assume it is doing that because it wants to go through the bloom cache, as opposed to just passing the hash of the single block which would not use the cache.

@SeaMonkey82 is prysm doing the same thing?

Since clearing the cache, I haven't run into this again with prysm-besu, but for reference to the exact parameters prysm is using when calling eth_getLogs when this error occurs, you can check my thread for the same issue with erigon where it still happens consistently.

erigontech/erigon#4666

It looks like it calls for a range of 1000 blocks.

I noticed in the Teku case it is calling with the same from and to values, indicating a single block. I assume it is doing that because it wants to go through the bloom cache, as opposed to just passing the hash of the single block which would not use the cache.

We're not deliberately trying to hit or avoid the cache here - just using the same code path we use when pulling ranges to get the initial deposits. Just that once we catch up with the "head" (actually head minus follow distance) we are updating on each new block so the range to query is just a single block.

@ajsutton are you seeing this building from Main? @jflo has not been able to reproduce on latest. Can you report back so we can engage if needed?

We haven't seen this issue since updating to 22.4.4. So it seems the fix that went in there has worked - we used to hit this every couple of days.

jflo commented

Thanks folks, closing as fixed by underlying stateroot solution.

Found this on Besu-Lodestar node on Ropsten, after restarting Lodestar, Besu was updated before to 22.7.0

lodestar-beacon       | Aug-05 09:35:06.283[ETH1]            error: Error updating eth1 chain cache  Non consecutive deposits. deposit[6] = 1911, deposit[7] 1913
lodestar-beacon       | Error: Non consecutive deposits. deposit[6] = 1911, deposit[7] 1913
lodestar-beacon       |     at assertConsecutiveDeposits (file:///usr/app/node_modules/@lodestar/beacon-node/src/eth1/utils/eth1DepositEvent.ts:9:13)
lodestar-beacon       |     at Eth1DepositsCache.add (file:///usr/app/node_modules/@lodestar/beacon-node/src/eth1/eth1DepositsCache.ts:42:5)
lodestar-beacon       |     at Eth1DepositDataTracker.updateDepositCache (file:///usr/app/node_modules/@lodestar/beacon-node/src/eth1/eth1DepositDataTracker.ts:206:30)
lodestar-beacon       |     at runMicrotasks (<anonymous>)
lodestar-beacon       |     at processTicksAndRejections (node:internal/process/task_queues:96:5)
lodestar-beacon       |     at Eth1DepositDataTracker.update (file:///usr/app/node_modules/@lodestar/beacon-node/src/eth1/eth1DepositDataTracker.ts:186:33)
lodestar-beacon       |     at Eth1DepositDataTracker.runAutoUpdate (file:///usr/app/node_modules/@lodestar/beacon-node/src/eth1/eth1DepositDataTracker.ts:149:29)
lodestar-beacon       | Aug-05 09:35:07.299[ETH1]            error: Error updating eth1 chain cache  Non consecutive deposits. deposit[6] = 1911, deposit[7] 1913
lodestar-beacon       | Error: Non consecutive deposits. deposit[6] = 1911, deposit[7] 1913
lodestar-beacon       |     at assertConsecutiveDeposits (file:///usr/app/node_modules/@lodestar/beacon-node/src/eth1/utils/eth1DepositEvent.ts:9:13)
lodestar-beacon       |     at Eth1DepositsCache.add (file:///usr/app/node_modules/@lodestar/beacon-node/src/eth1/eth1DepositsCache.ts:42:5)
lodestar-beacon       |     at Eth1DepositDataTracker.updateDepositCache (file:///usr/app/node_modules/@lodestar/beacon-node/src/eth1/eth1DepositDataTracker.ts:206:30)
lodestar-beacon       |     at runMicrotasks (<anonymous>)
lodestar-beacon       |     at processTicksAndRejections (node:internal/process/task_queues:96:5)
lodestar-beacon       |     at Eth1DepositDataTracker.update (file:///usr/app/node_modules/@lodestar/beacon-node/src/eth1/eth1DepositDataTracker.ts:186:33)
lodestar-beacon       |     at Eth1DepositDataTracker.runAutoUpdate (file:///usr/app/node_modules/@lodestar/beacon-node/src/eth1/eth1DepositDataTracker.ts:149:29)

I am going to resync Besu from scratch, but will keep the previous data dir in case you need it to debug