status-im/infra-nimbus

Deploy dedicated Geth instances for Miannet Nimbus nodes

jakubgs opened this issue · 5 comments

Since we are past Bellatrix we really need more than just one Mainnet Geth node for the 40 Beacon nodes we run:

nodes_layout:
'stable-small-01.aws-eu-central-1a.nimbus.mainnet':
- { branch: 'stable' }
'stable-small-02.aws-eu-central-1a.nimbus.mainnet':
- { branch: 'stable' }
'metal-01.he-eu-hel1.nimbus.mainnet':
- { branch: 'stable', num: 1 }
- { branch: 'stable', num: 2 }
- { branch: 'testing', num: 1, open_libp2p_ports: false }
- { branch: 'testing', num: 2 }
- { branch: 'unstable', num: 1, public_api: true }
- { branch: 'unstable', num: 2 }
'metal-02.he-eu-hel1.nimbus.mainnet':
- { branch: 'stable', num: 1 }
- { branch: 'stable', num: 2 }
- { branch: 'testing', num: 1, public_api: true }
- { branch: 'testing', num: 2 }
- { branch: 'unstable', num: 1, open_libp2p_ports: false }
- { branch: 'unstable', num: 2 }
'metal-03.he-eu-hel1.nimbus.mainnet':
- { branch: 'stable', num: 1 }
- { branch: 'stable', num: 2 }
- { branch: 'testing', num: 1 }
- { branch: 'testing', num: 2 }
- { branch: 'unstable', num: 1 }
- { branch: 'unstable', num: 2 }
'metal-04.he-eu-hel1.nimbus.mainnet':
- { branch: 'stable', num: 1 }
- { branch: 'stable', num: 2 }
- { branch: 'testing', num: 1 }
- { branch: 'testing', num: 2 }
- { branch: 'unstable', num: 1 }
- { branch: 'unstable', num: 2 }
'metal-05.he-eu-hel1.nimbus.mainnet':
- { branch: 'stable', num: 1 }
- { branch: 'stable', num: 2 }
- { branch: 'testing', num: 1 }
- { branch: 'testing', num: 2 }
- { branch: 'unstable', num: 1, db_purge: true }
- { branch: 'unstable', num: 2 }
- { branch: 'libp2p', num: 1 }
'metal-06.he-eu-hel1.nimbus.mainnet':
- { branch: 'stable', num: 1 }
- { branch: 'stable', num: 2 }
- { branch: 'testing', num: 1 }
- { branch: 'testing', num: 2 }
- { branch: 'unstable', num: 1, db_purge: true, db_sync: true }
- { branch: 'unstable', num: 2 }
- { branch: 'libp2p', num: 1 }

But currently Geth snap sync of Mainnet takes up slightly over 900 GB, almost 1 TB. At that cost running 40 snap-synced Geth nodes would cost 20*40=800 EUR per month in storage alone, not counting the cost of hosts.

Considering we don't run any validators on these hosts, it might be fine to run 1 Geth node per some N beacon nodes.
A reasonable compromise might be one Geth node per 6 beacon nodes on each of the mainnet hosts.

I have requested an extra 2 TB NVMe SSD for the 6 metal mainnet hosts which should cost 6 x 20.90 = € 125.4.
https://docs.hetzner.com/robot/dedicated-server/general-information/root-server-hardware/

I would like to purchase an extra 2 TB NVMe SSD for these 6 hosts:

AX41-NVMe #1551432 - 95.217.87.121
AX41-NVMe #1551433 - 135.181.0.33
AX41-NVMe #1551434 - 135.181.60.170
AX41-NVMe #1551436 - 65.21.193.229
AX41-NVMe #1551437 - 135.181.60.177
AX41-NVMe #1551438 - 135.181.56.50

1 for each of the listed hosts. My understanding based on your documentation is that it will cost 6x €20.90:
https://docs.hetzner.com/robot/dedicated-server/general-information/root-server-hardware/

Apparently the correct method to make such requests is through Robot > Server > Support tab > Product > Other route.

I have made the request for each host separately as support requested.

image

I've mounted the new volumes at /docker and deployed Geth nodes:

  • 98b38cbf - nimbus.mainnet: mount new NVMe volumes at /docker
  • b4066e73 - nimbus.mainnet: deploy one Geth node on metal hosts
  • 23a91206 - nimbus.mainnet: open ports for Geth exporter

Appears they are syncing:

image

Once synced I will switch the instances on each host to their local Geth node.

I've connected the Beacon nodes to the local Geth nodes:

  • 513fb2bb - get-geth-api-urls: drop Infura URLs entirely
  • fc23654e - nimbus.mainnet: use local Geth nodes for metal hosts

And it appears to work fine:

image

The two AWS hosts will continue using the mainnet-01.aws-eu-central-1a.nimbus.geth host.
I consider this done.