status-im/infra-nim-waku

Infra to run waku-simulator on latest nwaku master

alrevuelta opened this issue ยท 19 comments

In order to detect potential issues as soon as possible in nwaku we would need an instance of waku-simulator deployed with the latest nwaku master commit, so that every time we merge a new PR to nwaku, waku-simulator tool is redeployed with that image, so we can monitor if we are introducing any issues (specially related to networking or performance in general).

waku-simulator allows to easily:

  • Create a network with an arbitrary amount of nwaku nodes (max 250)
  • Automatically inject gosipsub traffic into the network with some configurable parameters.
  • Monitor said network with an already provisioned grafana dashboard.

What would we need?

  • Some infra to run waku-simulator
  • Redeploy the setup on every new commit to nwaku master. Unsure if this requires changes in nwaku CI, or perhaps it can be auto detected?
  • A static IP with port :3000 open so that we can visualize the metrics.

Important notes:

  • The amount of metrics that waku-simulator generates is quite high, this is why we provide our custom grafana/prometheus instance. I would suggest to not "index" these metrics in status infra. To avoid using too much diskspace, prometheus retention time is set to 7 days.
  • Nodes are running with a simple configuration, where each one should use < 100Mb. A machine with 64Gb should be enough.
  • Diskspace usage shouldn't be very high, store protocol is not used so the only data that is stored is the prometheus metrics.

TLDR:

We would need some infra to run waku-simulator so that every time we merge a PR to nwaku master, the following is executed. once a day, it deploys the latest nightly nwaku release see

This is the repo

git clone https://github.com/waku-org/waku-simulator.git
cd waku-simulator

And only LATEST_MASTER_PLACEHOLDER should be updated.

export NWAKU_IMAGE=statusteam/nim-waku:LATEST_MASTER_PLACEHOLDER
export NUM_NWAKU_NODES=100
export GOWAKU_IMAGE=statusteam/go-waku:v0.7.0
export NUM_GOWAKU_NODES=0
export MSG_PER_SECOND=10
export MSG_SIZE_KBYTES=10
docker-compose up -d

And then have the already provisioned dashboard available at ip:3000.

cc @jakubgs

Edited: Instead of running waku-simulator on every merge to nwaku master, just run it once a day (see nightly release)

I have confirmed that an AX41-NVMe host from Hetzner will suffice for this.
https://www.hetzner.com/dedicated-rootserver/matrix-ax

Possibly with extra memory in the future.

Looks like Alexis already generalized my script and role for handling GitHub webhooks to update a local repo:

So I can reuse that.

I'm refactoring infra-role-github-webhook to handle running a task after repo update. Should finish tomorrow.

@alrevuelta was setting export NUM_GOWAKU_NODES=0 intentional, or did you mean 10?

@alrevuelta was setting export NUM_GOWAKU_NODES=0 intentional, or did you mean 10?

yep, 0. by now we will be focusing only on the nwaku<->nwaku integration.

Here's a PR to allow auto-updates of Docker images:

I'm the one working on this.

Here's the initial setup:

What's remaining:

  • Expose webhook publicly and configure with GitHub
  • Add OAuth-Proxy for Grafana container
  • Configure .env symlink to configure containers from repo
  • Add Consul healthchecks and additional services

We're going to use a wakusim.env file in the repo to allow devs to adjust settings on the wakusim.misc host:

I have configured Grafana dashboard at https://simulator.waku.org/ using OAuth proxy:

We're not using Grafana built-in OAuth because that would require changes in the waku-simulator repo itself.

Also added extra healthchecks:

And fixed location of .env symlink task.

I have a PR going to migrate from old statusteam org to wakuorg on Docker Hub:

That is part of a proper setup of automatic builds of master branch that will push a Docker latest tag.

I've exposed the /webhook path under the same https://simulator.waku.org/ domain using nginx proxy:

Also added missing webhook secret.

Here are all the github-webhook Ansible role changes:

I had to modify the infra-bi repo to make it compatible:

Some more fixes for github-webhook running post update action:

And adjustments to Waku simulator role to allow restarting the compose service:

And it works:

server.py[248029]: INFO - jakubgs pushed refs/heads/master in waku-org/waku-simulator (8ce2afca-5d64-11ee-956f-7aa842426278)
server.py[248029]: INFO - New commit available: b58f2b0b31b26a571afe7623c24788ecf775cf9f
server.py[248029]: INFO - Updated repo to: b58f2b0b31b26a571afe7623c24788ecf775cf9f
server.py[248029]: INFO - Running post action!
server.py[248029]: INFO - Running command: /usr/bin/sudo systemctl restart waku-simulator-compose
sudo[248149]:  wakusim : PWD=/home/wakusim/webhook ; USER=root ; COMMAND=/bin/systemctl restart waku-simulator-compose
sudo[248149]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1500)
server.py[248029]: INFO - 127.0.0.1 - - [27/Sep/2023 18:35:05] "GET /health HTTP/1.1" 200 -
sudo[248149]: pam_unix(sudo:session): session closed for user root
server.py[248029]: INFO - Command success:
server.py[248029]: b''

I consider this done. If there's anything I missed pleas reopen.

Forgot to update docs:

Also had to fix project name which was repo due to checkout folder name:

I was hoping I could use the name parameter:

But it appears to be too fresh for Docker Compose version we have:

The Compose file './docker-compose.yml' is invalid because:
'name' does not match any of the regexes: '^x-'
You might be seeing this error because you're using the wrong Compose file version. Either specify a supported version (e.g "2.2" or "3.3") and place your service definitions under the `services` key, or omit the `version` key and place your service definitions at the root of the file to use version 1.
For more on the Compose file format versions, see https://docs.docker.com/compose/compose-file/