lidofinance/lido-oracle

Oracle look back (non-waiting) algorithm

Closed this issue · 11 comments

Current approach of the awaiting main loop

branch: daemon_v2, commit 86d343dc415d373892c7aefdd38713787b55fc9a

  1. The oracle daemon fetches the current frame from the ETH1 contract. (it contains reportable epoch id)
  2. Then it gets the last finalized epoch from ETH2
  3. If it's less than reportable epoch id, the daemon sleeps and retries again (p. 2).

What's wrong with it

Problem1: this loop can take a long time, especially if the network lost finality.

Problem2: The previous reportable epoch (frame) may be finalized on ETH2 but it never reported yet. But daemon will wait forever to reportable epoch gets finalized.

Problem3: Races of reports. With the shorter WAIT times the daemons are very likely to send their reports in parallel. And if the quorum got achieved, the delayed TXes will be reverted. This leads to red TXes on etherscan and irrational gas consumption.

Problem4: Bad UX for the supervisor (human) who attached to the daemon interactively (in DAEMON=0) mode and awaits for interactive prompt. In prod this will take up to one day. And next retry is possible only tomorrow.

Problem5: (@vshvsh found this irrelevant) The daemons don't try to figure our the exact ETH1 state (block) to poll contract state from. The oracles polling at different times will probably get the different numbers. For example, changes in SPs and keys will probably introduce the divergence of reports on the same epochId.

Proposed approach

  1. get the current frame
  2. If its reportable epoch is not finalized
  3. Daemon goes one frame back and calculates the reportableEpoch of the previous frame
  4. If its reportable epoch is not finalized daemon repeats p. 3 until reaches the already pushed epochID. In this case it goes to p.1

PS: Each ETH1 poll will need to iteratively figure out the exact ETH1 block (I'd suggest to take the first block with timestamp > epochId.end). Even if marked as absulutely irrelevant initially, the look back approach may look many days back. So I'd suggest to consider this.

Is infinite waiting still possible?

Yes, if there is nothing to report (all the past frames were either pushed or already reported)

@vshvsh could you please take a look? I think it will be operational in the most cases I can imagine.

Yeah, I was proposing something essentially identical to that with a few caveats. Only one polling of eth1 is needed, and I think the oracle must exit on p. 4 so that how it should work.

  1. Get all the most recent keys from eth1
  2. Find the most recent frame that is at once reportable and finalized and calculate the reportable epoch.
  3. If there's no such frame, exit.
  4. If there is, report the reportable epoch.

For 2. you don't have to do it in a loop or make weird history queries to eth1.

Find last finalized epoch; then find last finalized frame by integer division basically (finalized_epoch div epochs_per_frame) * epochs_per_frame. Then check it against last reported epoch, if it's greater we can report it.

As far as I understand, the pyrmont branch implements exactly the algorithm proposed by @vshvsh. Please correct me if I'm wrong.

If so we should use it

Let's go step by step:

Entrypoint is CMD ["python3", "-u", "oracle.py"]

The main loop starts in oracle.py:L156

while True:

The main behaviour is waiting until slot gets finalized. If not continue stmt continues with the next iteration of the loop

continue

So the pyrmont's behavior is "awaiting" and it's ~identical to daemon_v2. Both don't satisfy p.2 - both don't find, both wait.

And it was introduced into our branch by cherry-picking this commit and came from pyrmont 3ba6a8c when we merged nightly hotfixes back. I'm unable to demonstrate how it was implemented initially since we dropped the most conflicting commits before rebasing.

If there is still an implementation (more) conforming to the @vshvsh spec, please point me, we'll definitely pick it.

yeah, pyrmont's not doing that either

Snippets of operation

Weather: normal
Responded: y

    INFO 2020-12-05 21:27:23 <daemon> MEMBER_PRIV_KEY provided, running in transactable (PRODUCTION) mode
    INFO 2020-12-05 21:27:23 <daemon> Member account: 0x656e544DeaB532e9f5B8b8079B3809Aa1757fb0D
    INFO 2020-12-05 21:27:23 <daemon> DAEMON=0 Running in single iteration mode (will exit after reporting).
    INFO 2020-12-05 21:27:23 <daemon> ETH1_NODE=http://127.0.0.1:8545
    INFO 2020-12-05 21:27:23 <daemon> BEACON_NODE=http://127.0.0.1:5052 (Lighthouse API)
    INFO 2020-12-05 21:27:23 <daemon> SLEEP=60 s
    INFO 2020-12-05 21:27:23 <daemon> GAS_LIMIT=1500000 gas units
    INFO 2020-12-05 21:27:23 <daemon> POOL_CONTRACT=0xc12e8e7adcEaF31c1Ca5F8aFD99AB88439628183
    INFO 2020-12-05 21:27:23 <daemon> Oracle contract address: 0xcD3db5ca818a645359e09543Cc0e5b7bB9593229 (auto-discovered)
    INFO 2020-12-05 21:27:23 <daemon> Registry contract address: 0x7fAF80E96530e5cd13a1f35701fcc6b334B2FD75 (auto-discovered)
    INFO 2020-12-05 21:27:23 <daemon> Seconds per slot: 1 (auto-discovered)
    INFO 2020-12-05 21:27:23 <daemon> Slots per epoch: 8 (auto-discovered)
    INFO 2020-12-05 21:27:23 <daemon> Epochs per frame: 20 (auto-discovered)
    INFO 2020-12-05 21:27:23 <daemon> Genesis time: 1607192016 (auto-discovered)
    INFO 2020-12-05 21:27:23 <daemon> Starting the main loop
    INFO 2020-12-05 21:27:23 <daemon> Potentially reportable epoch: 100 (from ETH1 contract)
    INFO 2020-12-05 21:27:23 <daemon> Last finalized epoch: 101 (from Beacon)
    INFO 2020-12-05 21:27:23 <daemon> Reportable state: epoch:100 slot:800
    INFO 2020-12-05 21:27:23 <daemon> Quering NodeOperatorsRegistry...
    INFO 2020-12-05 21:27:23 <daemon> Node operators in registry: 2
    INFO 2020-12-05 21:27:23 <daemon> Node operator ID: 0 Keys: 10
    INFO 2020-12-05 21:27:23 <daemon> Node operator ID: 1 Keys: 10
    INFO 2020-12-05 21:27:23 <daemon> Total validator keys in registry: 20
    INFO 2020-12-05 21:27:23 <daemon> Fetching validators from Beacon node...
    INFO 2020-12-05 21:27:25 <daemon> Validator balances on beacon for slot: 800
    INFO 2020-12-05 21:27:25 <daemon> Pubkey: 0xa3d9511870 Balance: 32243655685 Gwei
    INFO 2020-12-05 21:27:25 <daemon> Pubkey: 0x94f9f5c1ad Balance: 32247016978 Gwei
    INFO 2020-12-05 21:27:25 <daemon> Pubkey: 0x8e7ebb0d21 Balance: 32246340858 Gwei
    INFO 2020-12-05 21:27:25 <daemon> Validator balances sum: 96737013521 Gwei
    INFO 2020-12-05 21:27:25 <daemon> Total balance on Beacon: 96737013521000000000 wei
    INFO 2020-12-05 21:27:25 <daemon> Lido validators on Beacon: 3
    INFO 2020-12-05 21:27:25 <daemon> Tx call data: oracle.reportBeacon(100, 96737013521000000000, 3)
    INFO 2020-12-05 21:27:25 <daemon> Calling tx locally is succeeded. Sending it to the network
    INFO 2020-12-05 21:27:25 <daemon> Tx data: {'value': 0, 'gasPrice': 1000000000, 'chainId': 1337, 'from': '0x656e544DeaB532e9f5B8b8079B3809Aa1757fb0D', 'gas': 1500000, 'to': '0xcD3db5ca818a645359e09543Cc0e5b7bB9593229', 'data': '0x62eeb73200000000000000000000000000000000000000000000000000000000000000640000000000000000000000000000000000000000000000053e7ee9433f5d6a000000000000000000000000000000000000000000000000000000000000000003'}
 WARNING 2020-12-05 21:27:25 <daemon> Should we sent this TX? [y/n]: 

Please respond with [y or n]: 
Please respond with [y or n]: u
Please respond with [y or n]: y
    INFO 2020-12-05 21:27:38 <daemon> Prepearing to send a tx...
    INFO 2020-12-05 21:27:43 <daemon> Transaction in progress...
    INFO 2020-12-05 21:27:48 <daemon> Transaction hash: 0x4f2692267ff97ecfd28c78878256d6c28c5dd1c0fa2986b695e3d8b1ce59c2d5
    INFO 2020-12-05 21:27:48 <daemon> Transaction successful
    INFO 2020-12-05 21:27:48 <daemon> We are in single-iteration mode, so exiting. Set DAEMON=1 env to run in the loop.

Weather: Normal
Responded: n

    INFO 2020-12-05 21:30:41 <daemon> MEMBER_PRIV_KEY provided, running in transactable (PRODUCTION) mode
    INFO 2020-12-05 21:30:41 <daemon> Member account: 0x656e544DeaB532e9f5B8b8079B3809Aa1757fb0D
    INFO 2020-12-05 21:30:42 <daemon> DAEMON=0 Running in single iteration mode (will exit after reporting).
    INFO 2020-12-05 21:30:42 <daemon> ETH1_NODE=http://127.0.0.1:8545
    INFO 2020-12-05 21:30:42 <daemon> BEACON_NODE=http://127.0.0.1:5052 (Lighthouse API)
    INFO 2020-12-05 21:30:42 <daemon> SLEEP=60 s
    INFO 2020-12-05 21:30:42 <daemon> GAS_LIMIT=1500000 gas units
    INFO 2020-12-05 21:30:42 <daemon> POOL_CONTRACT=0xc12e8e7adcEaF31c1Ca5F8aFD99AB88439628183
    INFO 2020-12-05 21:30:42 <daemon> Oracle contract address: 0xcD3db5ca818a645359e09543Cc0e5b7bB9593229 (auto-discovered)
    INFO 2020-12-05 21:30:42 <daemon> Registry contract address: 0x7fAF80E96530e5cd13a1f35701fcc6b334B2FD75 (auto-discovered)
    INFO 2020-12-05 21:30:42 <daemon> Seconds per slot: 1 (auto-discovered)
    INFO 2020-12-05 21:30:42 <daemon> Slots per epoch: 8 (auto-discovered)
    INFO 2020-12-05 21:30:42 <daemon> Epochs per frame: 20 (auto-discovered)
    INFO 2020-12-05 21:30:42 <daemon> Genesis time: 1607192016 (auto-discovered)
    INFO 2020-12-05 21:30:42 <daemon> Starting the main loop
    INFO 2020-12-05 21:30:42 <daemon> Potentially reportable epoch: 120 (from ETH1 contract)
    INFO 2020-12-05 21:30:42 <daemon> Last finalized epoch: 126 (from Beacon)
    INFO 2020-12-05 21:30:42 <daemon> Reportable state: epoch:120 slot:960
    INFO 2020-12-05 21:30:42 <daemon> Quering NodeOperatorsRegistry...
    INFO 2020-12-05 21:30:42 <daemon> Node operators in registry: 2
    INFO 2020-12-05 21:30:42 <daemon> Node operator ID: 0 Keys: 10
    INFO 2020-12-05 21:30:42 <daemon> Node operator ID: 1 Keys: 10
    INFO 2020-12-05 21:30:42 <daemon> Total validator keys in registry: 20
    INFO 2020-12-05 21:30:42 <daemon> Fetching validators from Beacon node...
    INFO 2020-12-05 21:30:42 <daemon> Validator balances on beacon for slot: 960
    INFO 2020-12-05 21:30:42 <daemon> Pubkey: 0x94f9f5c1ad Balance: 32332768784 Gwei
    INFO 2020-12-05 21:30:42 <daemon> Pubkey: 0x8e7ebb0d21 Balance: 32330373378 Gwei
    INFO 2020-12-05 21:30:42 <daemon> Pubkey: 0xa3d9511870 Balance: 32327823429 Gwei
    INFO 2020-12-05 21:30:42 <daemon> Validator balances sum: 96990965591 Gwei
    INFO 2020-12-05 21:30:42 <daemon> Total balance on Beacon: 96990965591000000000 wei
    INFO 2020-12-05 21:30:42 <daemon> Lido validators on Beacon: 3
    INFO 2020-12-05 21:30:42 <daemon> Tx call data: oracle.reportBeacon(120, 96990965591000000000, 3)
    INFO 2020-12-05 21:30:42 <daemon> Calling tx locally is succeeded. Sending it to the network
    INFO 2020-12-05 21:30:42 <daemon> Tx data: {'value': 0, 'gasPrice': 1000000000, 'chainId': 1337, 'from': '0x656e544DeaB532e9f5B8b8079B3809Aa1757fb0D', 'gas': 1500000, 'to': '0xcD3db5ca818a645359e09543Cc0e5b7bB9593229', 'data': '0x62eeb73200000000000000000000000000000000000000000000000000000000000000780000000000000000000000000000000000000000000000054205215329b0a6000000000000000000000000000000000000000000000000000000000000000003'}
 WARNING 2020-12-05 21:30:42 <daemon> Should we sent this TX? [y/n]: 
n
    INFO 2020-12-05 21:30:46 <daemon> We are in single-iteration mode, so exiting. Set DAEMON=1 env to run in the loop.

Weather: BAD (lost finality after slot 142)

Lighthouse's output

Dec 05 18:36:17.946 WARN Low peer count                          peer_count: 1, service: slot_notifier
Dec 05 18:36:17.948 INFO Synced                                  slot: 1361, block:    …  empty, epoch: 170, finalized_epoch: 142, finalized_root: 0x540a…18c2, peers: 1, service: slot_notifier
Dec 05 18:36:18.947 WARN Low peer count                          peer_count: 1, service: slot_notifier
Dec 05 18:36:18.947 INFO Synced                                  slot: 1362, block:    …  empty, epoch: 170, finalized_epoch: 142, finalized_root: 0x540a…18c2, peers: 1, service: slot_notifier

Oracle

    INFO 2020-12-05 21:37:08 <daemon> MEMBER_PRIV_KEY provided, running in transactable (PRODUCTION) mode
    INFO 2020-12-05 21:37:08 <daemon> Member account: 0x656e544DeaB532e9f5B8b8079B3809Aa1757fb0D
    INFO 2020-12-05 21:37:08 <daemon> DAEMON=0 Running in single iteration mode (will exit after reporting).
    INFO 2020-12-05 21:37:08 <daemon> ETH1_NODE=http://127.0.0.1:8545
    INFO 2020-12-05 21:37:08 <daemon> BEACON_NODE=http://127.0.0.1:5052 (Lighthouse API)
    INFO 2020-12-05 21:37:08 <daemon> SLEEP=60 s
    INFO 2020-12-05 21:37:08 <daemon> GAS_LIMIT=1500000 gas units
    INFO 2020-12-05 21:37:08 <daemon> POOL_CONTRACT=0xc12e8e7adcEaF31c1Ca5F8aFD99AB88439628183
    INFO 2020-12-05 21:37:08 <daemon> Oracle contract address: 0xcD3db5ca818a645359e09543Cc0e5b7bB9593229 (auto-discovered)
    INFO 2020-12-05 21:37:08 <daemon> Registry contract address: 0x7fAF80E96530e5cd13a1f35701fcc6b334B2FD75 (auto-discovered)
    INFO 2020-12-05 21:37:08 <daemon> Seconds per slot: 1 (auto-discovered)
    INFO 2020-12-05 21:37:08 <daemon> Slots per epoch: 8 (auto-discovered)
    INFO 2020-12-05 21:37:08 <daemon> Epochs per frame: 20 (auto-discovered)
    INFO 2020-12-05 21:37:08 <daemon> Genesis time: 1607192016 (auto-discovered)
    INFO 2020-12-05 21:37:08 <daemon> Starting the main loop
    INFO 2020-12-05 21:37:08 <daemon> Potentially reportable epoch: 160 (from ETH1 contract)
    INFO 2020-12-05 21:37:08 <daemon> Last finalized epoch: 142 (from Beacon)
    INFO 2020-12-05 21:37:08 <daemon> Reportable state: epoch:140 slot:1120
    INFO 2020-12-05 21:37:08 <daemon> Quering NodeOperatorsRegistry...
    INFO 2020-12-05 21:37:08 <daemon> Node operators in registry: 2
    INFO 2020-12-05 21:37:08 <daemon> Node operator ID: 0 Keys: 10
    INFO 2020-12-05 21:37:08 <daemon> Node operator ID: 1 Keys: 10
    INFO 2020-12-05 21:37:08 <daemon> Total validator keys in registry: 20
    INFO 2020-12-05 21:37:08 <daemon> Fetching validators from Beacon node...
    INFO 2020-12-05 21:37:08 <daemon> Validator balances on beacon for slot: 1120
    INFO 2020-12-05 21:37:08 <daemon> Pubkey: 0xa3d9511870 Balance: 32382418808 Gwei
    INFO 2020-12-05 21:37:08 <daemon> Pubkey: 0x8e7ebb0d21 Balance: 32386494864 Gwei
    INFO 2020-12-05 21:37:08 <daemon> Pubkey: 0x94f9f5c1ad Balance: 32394607707 Gwei
    INFO 2020-12-05 21:37:08 <daemon> Validator balances sum: 97163521379 Gwei
    INFO 2020-12-05 21:37:08 <daemon> Total balance on Beacon: 97163521379000000000 wei
    INFO 2020-12-05 21:37:08 <daemon> Lido validators on Beacon: 3
    INFO 2020-12-05 21:37:08 <daemon> Tx call data: oracle.reportBeacon(140, 97163521379000000000, 3)
    INFO 2020-12-05 21:37:08 <daemon> Calling tx locally is succeeded. Sending it to the network
    INFO 2020-12-05 21:37:08 <daemon> Tx data: {'value': 0, 'gasPrice': 1000000000, 'chainId': 1337, 'from': '0x656e544DeaB532e9f5B8b8079B3809Aa1757fb0D', 'gas': 1500000, 'to': '0xcD3db5ca818a645359e09543Cc0e5b7bB9593229', 'data': '0x62eeb732000000000000000000000000000000000000000000000000000000000000008c000000000000000000000000000000000000000000000005446a2be6595e1e000000000000000000000000000000000000000000000000000000000000000003'}
 WARNING 2020-12-05 21:37:08 <daemon> Should we sent this TX? [y/n]: 
y
    INFO 2020-12-05 21:37:13 <daemon> Prepearing to send a tx...
    INFO 2020-12-05 21:37:18 <daemon> Transaction in progress...
    INFO 2020-12-05 21:37:24 <daemon> Transaction hash: 0x2c933b4ca366a661810343d7e6503c8cac500e99ae201e55cd310827daa84309
    INFO 2020-12-05 21:37:24 <daemon> Transaction successful
    INFO 2020-12-05 21:37:24 <daemon> We are in single-iteration mode, so exiting. Set DAEMON=1 env to run in the loop.