Besu failing to index event log

Question

Besu failing to index event log

ajsutton opened this issue 3 years ago · 9 comments

Description

On one of the EF teku-besu kiln nodes, the besu instance is failing to return one of the deposit event logs. Since the CL requires all deposit events in order Teku is complaining about this quite a lot.

This is on box 164.92.203.174

JSON RPC request is:

{"jsonrpc":"2.0","method":"eth_getLogs","params":[{"topics":["0x649bbc62d0e31342afea4e5cd82d4049e7e1ee912fc0889aa790803be39038c5"],"fromBlock":"0x8547d","toBlock":"0x8547d","address":["0x4242424242424242424242424242424242424242"]}],"id":0}

The broken besu instance returns:

{
  "jsonrpc" : "2.0",
  "id" : 0,
  "result" : [ ]
}

But another teku-besu instance on kiln returns the deposit event correctly:

{
  "jsonrpc" : "2.0",
  "id" : 0,
  "result" : [ {
    "logIndex" : "0x0",
    "removed" : false,
    "blockNumber" : "0x8547d",
    "blockHash" : "0xa62e2b2f55447ecbf5b603fa032746222da6039f27851175a3214812029ebc70",
    "transactionHash" : "0x589c0803aef2b3d0ecd48888bd0f9d799fd2421860c46bf91e2e21f0eb82414d",
    "transactionIndex" : "0x3c",
    "address" : "0x4242424242424242424242424242424242424242",
    "data" : "0x00000000000000000000000000000000000000000000000000000000000000a000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000140000000000000000000000000000000000000000000000000000000000000018000000000000000000000000000000000000000000000000000000000000002000000000000000000000000000000000000000000000000000000000000000030933f02fc10608e5b41513035b9f8f08306a0c54fcb2b04a673881987e6eb089b983481f5d888d4f6e08ebffbb41fcc4f000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000020005db87e87315c6545e51a5daff02eab6df948cc8c9a49d6d7d03b80d1f18096000000000000000000000000000000000000000000000000000000000000000800405973070000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000060aa19ef95787fe491388337f47fdb67936ae0aeb437766b5cc44402c8ecd08d51fe7cca7643d85fca8ec0a1c16f5102210e4b41bb6ef2f84964bdfeb733121d046e0d5cd7eb85ce8e399d0d4e61cc0cb979f58717e925e0c06a734f0311a772a00000000000000000000000000000000000000000000000000000000000000008442d000000000000000000000000000000000000000000000000000000000000",
    "topics" : [ "0x649bbc62d0e31342afea4e5cd82d4049e7e1ee912fc0889aa790803be39038c5" ]
  } ]
}

Somehow besu has failed to index this deposit event.

Answer 1 · 2022-06-10T14:14:12.000Z

Had a second instance of this on ropsten (post merge). Resolved by stopping besu, deleting all the caches/*.cache files and restarting besu again. So besu had the deposit log fine but created incorrect log bloom cache files.

Answer 2 · 2022-07-06T02:43:21.000Z

I finally have evidence of this leading to a failed block proposal - presumably because the block would have contained deposits?
https://ropsten.beaconcha.in/block/260588
Validator successfully attested during the epoch, but failed to propose a block.

Prysm shows

ERROR powchain: Unable to process past deposit contract logs: Could not process deposit log: received incorrect merkle index: wanted 2974 but got 2980

edit Confirmed that rm caches/*.cache resolved the issue.

Answer 3 · 2022-07-07T18:46:32.000Z

I noticed in the Teku case it is calling with the same from and to values, indicating a single block. I assume it is doing that because it wants to go through the bloom cache, as opposed to just passing the hash of the single block which would not use the cache.

@SeaMonkey82 is prysm doing the same thing?

Answer 4 · 2022-07-07T23:41:56.000Z

Since clearing the cache, I haven't run into this again with prysm-besu, but for reference to the exact parameters prysm is using when calling eth_getLogs when this error occurs, you can check my thread for the same issue with erigon where it still happens consistently.

erigontech/erigon#4666

It looks like it calls for a range of 1000 blocks.

Answer 5 · 2022-07-08T00:16:28.000Z

I noticed in the Teku case it is calling with the same from and to values, indicating a single block. I assume it is doing that because it wants to go through the bloom cache, as opposed to just passing the hash of the single block which would not use the cache.

We're not deliberately trying to hit or avoid the cache here - just using the same code path we use when pulling ranges to get the initial deposits. Just that once we catch up with the "head" (actually head minus follow distance) we are updating on each new block so the range to query is just a single block.

Answer 6 · 2022-07-19T17:02:13.000Z

@ajsutton are you seeing this building from Main? @jflo has not been able to reproduce on latest. Can you report back so we can engage if needed?

Answer 7 · 2022-07-19T22:33:02.000Z

We haven't seen this issue since updating to 22.4.4. So it seems the fix that went in there has worked - we used to hit this every couple of days.

Answer 8 · 2022-07-20T17:50:15.000Z

Thanks folks, closing as fixed by underlying stateroot solution.

Answer 9 · 2022-08-05T09:39:39.000Z

Found this on Besu-Lodestar node on Ropsten, after restarting Lodestar, Besu was updated before to 22.7.0

lodestar-beacon       | Aug-05 09:35:06.283[ETH1]            error: Error updating eth1 chain cache  Non consecutive deposits. deposit[6] = 1911, deposit[7] 1913
lodestar-beacon       | Error: Non consecutive deposits. deposit[6] = 1911, deposit[7] 1913
lodestar-beacon       |     at assertConsecutiveDeposits (file:///usr/app/node_modules/@lodestar/beacon-node/src/eth1/utils/eth1DepositEvent.ts:9:13)
lodestar-beacon       |     at Eth1DepositsCache.add (file:///usr/app/node_modules/@lodestar/beacon-node/src/eth1/eth1DepositsCache.ts:42:5)
lodestar-beacon       |     at Eth1DepositDataTracker.updateDepositCache (file:///usr/app/node_modules/@lodestar/beacon-node/src/eth1/eth1DepositDataTracker.ts:206:30)
lodestar-beacon       |     at runMicrotasks (<anonymous>)
lodestar-beacon       |     at processTicksAndRejections (node:internal/process/task_queues:96:5)
lodestar-beacon       |     at Eth1DepositDataTracker.update (file:///usr/app/node_modules/@lodestar/beacon-node/src/eth1/eth1DepositDataTracker.ts:186:33)
lodestar-beacon       |     at Eth1DepositDataTracker.runAutoUpdate (file:///usr/app/node_modules/@lodestar/beacon-node/src/eth1/eth1DepositDataTracker.ts:149:29)
lodestar-beacon       | Aug-05 09:35:07.299[ETH1]            error: Error updating eth1 chain cache  Non consecutive deposits. deposit[6] = 1911, deposit[7] 1913
lodestar-beacon       | Error: Non consecutive deposits. deposit[6] = 1911, deposit[7] 1913
lodestar-beacon       |     at assertConsecutiveDeposits (file:///usr/app/node_modules/@lodestar/beacon-node/src/eth1/utils/eth1DepositEvent.ts:9:13)
lodestar-beacon       |     at Eth1DepositsCache.add (file:///usr/app/node_modules/@lodestar/beacon-node/src/eth1/eth1DepositsCache.ts:42:5)
lodestar-beacon       |     at Eth1DepositDataTracker.updateDepositCache (file:///usr/app/node_modules/@lodestar/beacon-node/src/eth1/eth1DepositDataTracker.ts:206:30)
lodestar-beacon       |     at runMicrotasks (<anonymous>)
lodestar-beacon       |     at processTicksAndRejections (node:internal/process/task_queues:96:5)
lodestar-beacon       |     at Eth1DepositDataTracker.update (file:///usr/app/node_modules/@lodestar/beacon-node/src/eth1/eth1DepositDataTracker.ts:186:33)
lodestar-beacon       |     at Eth1DepositDataTracker.runAutoUpdate (file:///usr/app/node_modules/@lodestar/beacon-node/src/eth1/eth1DepositDataTracker.ts:149:29)

I am going to resync Besu from scratch, but will keep the previous data dir in case you need it to debug