Infra to run waku-simulator on latest nwaku master
alrevuelta opened this issue ยท 19 comments
In order to detect potential issues as soon as possible in nwaku
we would need an instance of waku-simulator deployed with the latest nwaku master commit, so that every time we merge a new PR to nwaku, waku-simulator tool is redeployed with that image, so we can monitor if we are introducing any issues (specially related to networking or performance in general).
waku-simulator
allows to easily:
- Create a network with an arbitrary amount of nwaku nodes (max 250)
- Automatically inject gosipsub traffic into the network with some configurable parameters.
- Monitor said network with an already provisioned grafana dashboard.
What would we need?
- Some infra to run
waku-simulator
- Redeploy the setup on every new commit to
nwaku
master. Unsure if this requires changes in nwaku CI, or perhaps it can be auto detected? - A static IP with port
:3000
open so that we can visualize the metrics.
Important notes:
- The amount of metrics that
waku-simulator
generates is quite high, this is why we provide our custom grafana/prometheus instance. I would suggest to not "index" these metrics in status infra. To avoid using too much diskspace, prometheus retention time is set to 7 days. - Nodes are running with a simple configuration, where each one should use < 100Mb. A machine with 64Gb should be enough.
- Diskspace usage shouldn't be very high, store protocol is not used so the only data that is stored is the prometheus metrics.
TLDR:
We would need some infra to run waku-simulator so that every time we merge a PR to once a day, it deploys the latest nightly nwaku release seenwaku
master, the following is executed.
This is the repo
git clone https://github.com/waku-org/waku-simulator.git
cd waku-simulator
And only LATEST_MASTER_PLACEHOLDER
should be updated.
export NWAKU_IMAGE=statusteam/nim-waku:LATEST_MASTER_PLACEHOLDER
export NUM_NWAKU_NODES=100
export GOWAKU_IMAGE=statusteam/go-waku:v0.7.0
export NUM_GOWAKU_NODES=0
export MSG_PER_SECOND=10
export MSG_SIZE_KBYTES=10
docker-compose up -d
And then have the already provisioned dashboard available at ip:3000
.
cc @jakubgs
Edited: Instead of running waku-simulator on every merge to nwaku master, just run it once a day (see nightly release)
I have confirmed that an AX41-NVMe host from Hetzner will suffice for this.
https://www.hetzner.com/dedicated-rootserver/matrix-ax
Possibly with extra memory in the future.
Looks like Alexis already generalized my script and role for handling GitHub webhooks to update a local repo:
So I can reuse that.
I'm refactoring infra-role-github-webhook
to handle running a task after repo update. Should finish tomorrow.
@alrevuelta was setting export NUM_GOWAKU_NODES=0
intentional, or did you mean 10?
@alrevuelta was setting export NUM_GOWAKU_NODES=0 intentional, or did you mean 10?
yep, 0. by now we will be focusing only on the nwaku<->nwaku integration.
Here's a PR to allow auto-updates of Docker images:
I'm the one working on this.
Here's the initial setup:
infra-misc#57001901
- add metal-01.he-eu-hel1.wakusim.misc hostinfra-misc#cb5cf671
- waku-simulator: first working version of setupinfra-misc#f215ff63
- waku-simulator: add initial README fileinfra-misc#924011c8
- wakusim: add boostrap settings, playbook
What's remaining:
- Expose webhook publicly and configure with GitHub
- Add OAuth-Proxy for Grafana container
- Configure
.env
symlink to configure containers from repo - Add Consul healthchecks and additional services
We're going to use a wakusim.env
file in the repo to allow devs to adjust settings on the wakusim.misc
host:
I have configured Grafana dashboard at https://simulator.waku.org/ using OAuth proxy:
- https://github.com/status-im/infra-misc/commit/6f561143 - wakusim: add oauth-proxy for Grafana instance
We're not using Grafana built-in OAuth because that would require changes in the waku-simulator
repo itself.
Also added extra healthchecks:
- https://github.com/status-im/infra-misc/commit/209a210f - waku-simulator: consul grafana and prometheus checks
- https://github.com/status-im/infra-misc/commit/33469bd0 - waku-simulator: create .env symlink later
- https://github.com/status-im/infra-misc/commit/dbb28dc3 - waku-simulator: add healthcheck for compose service
And fixed location of .env
symlink task.
I have a PR going to migrate from old statusteam
org to wakuorg
on Docker Hub:
That is part of a proper setup of automatic builds of master
branch that will push a Docker latest
tag.
I've exposed the /webhook
path under the same https://simulator.waku.org/ domain using nginx proxy:
infra-misc#8c71f628
- wakusim: Nginx proxy to combine webhook and Grafanainfra-misc#bb640c74
- waku-simulator: pass mandatory webhook secret
Also added missing webhook secret.
Here are all the github-webhook
Ansible role changes:
infra-role-github-webhook#9599fefa
- user: use github name for user by defaultinfra-role-github-webhook#eb6db1b6
- user: fix naming of groups additional groups varinfra-role-github-webhook#42af0f6f
- service: move description to the templateinfra-role-github-webhook#dcfc06f8
- user: use 1500 UID, let fleets override itinfra-role-github-webhook#ffaf74fb
- server: support for running command after repo updateinfra-role-github-webhook#32f5f2c3
- consul: add missing service ID and portinfra-role-github-webhook#efa1ccd9
- readme: update config examples and explain optionsinfra-role-github-webhook#802cd09d
- server: drop appending repo name to repo path
I had to modify the infra-bi
repo to make it compatible:
- https://github.com/status-im/infra-bi/pull/58
- https://github.com/status-im/infra-bi/pull/57
- status-im/airflow-dags@a1035be - updating the dbt path following webhook update
Some more fixes for github-webhook
running post update action:
infra-role-github-webhook#4d4f8f29
- service: quote post command in service definitioninfra-role-github-webhook#01bed38f
- server: fix import of CalledProcessError from subprocessinfra-role-github-webhook#9ca408c4
- server: remove post_action argument from on_pushinfra-role-github-webhook#a8cecc02
- server: fix extracting name from repo url
And adjustments to Waku simulator role to allow restarting the compose service:
infra-misc#af34dbc6
- waku-simulator: restart compose service with sudo
And it works:
server.py[248029]: INFO - jakubgs pushed refs/heads/master in waku-org/waku-simulator (8ce2afca-5d64-11ee-956f-7aa842426278)
server.py[248029]: INFO - New commit available: b58f2b0b31b26a571afe7623c24788ecf775cf9f
server.py[248029]: INFO - Updated repo to: b58f2b0b31b26a571afe7623c24788ecf775cf9f
server.py[248029]: INFO - Running post action!
server.py[248029]: INFO - Running command: /usr/bin/sudo systemctl restart waku-simulator-compose
sudo[248149]: wakusim : PWD=/home/wakusim/webhook ; USER=root ; COMMAND=/bin/systemctl restart waku-simulator-compose
sudo[248149]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1500)
server.py[248029]: INFO - 127.0.0.1 - - [27/Sep/2023 18:35:05] "GET /health HTTP/1.1" 200 -
sudo[248149]: pam_unix(sudo:session): session closed for user root
server.py[248029]: INFO - Command success:
server.py[248029]: b''
I consider this done. If there's anything I missed pleas reopen.
Forgot to update docs:
Also had to fix project name which was repo
due to checkout folder name:
infra-misc#4f9f3518
- waku-simulator: set COMPOSE_PROJECT_NAMEin service
I was hoping I could use the name
parameter:
But it appears to be too fresh for Docker Compose version we have:
The Compose file './docker-compose.yml' is invalid because:
'name' does not match any of the regexes: '^x-'
You might be seeing this error because you're using the wrong Compose file version. Either specify a supported version (e.g "2.2" or "3.3") and place your service definitions under the `services` key, or omit the `version` key and place your service definitions at the root of the file to use version 1.
For more on the Compose file format versions, see https://docs.docker.com/compose/compose-file/