Deploy websockify with nim-waku nodes in test fleet
Closed this issue ยท 31 comments
Problem
- nim-waku does not support secure websocket connectivity.
- js-waku in the browser only supports secure websocket connections
Solution
- Get certificates using letsencrypt
- Deploy and run websockify to proxy nim-waku tcp connections and wrap them with SSL+Websocket
Steps
For each wakunode2
binary deployed in the fleet, considering wakunode2
is deployed on <testmachine.statusim.net>
, listening on tcp port <tcp_port>
, on the same machine:
- Install certbot:
sudo apt install certbot
- Get certificates:
sudo certbot certonly -d <testmachine.statusim.net>
- Install websockify
sudo apt install websockify
- Start websockify
sudo websockify --cert /etc/letsencrypt/live/<testmachine.statusim.net>/fullchain.pem --key /etc/letsencrypt/live/<testmachine.statusim.net>/privkey.pem 0.0.0.0:443 127.0.0.1:<tcp_port>
- Register the websocket address to https://fleets.status.im/
Value should be: /dns4/<testmachine.statusim.net>/tcp/443/wss/p2p/<peer-id>
, with the same peer-id value used in the current JSON file.
Notes:
a. If the file system is preserved between deployment, 1, 2 & 3 may not be necessary.
b. Not sure what will be the best way to embed this new value in https://fleets.status.im/
c. Once this is confirmed as working as expected on the test fleet, I'll open an issue for a similar setup on the prod fleet.
d. When running certbot
, it detects webserver present and proposes several methods to action the challenge.
Commented in Discord already but: Seems fine, I'd probably put this under a separate namespace from current nim-waku test nodes but still under wakuv2.test
, then connect to the other nim-waku nodes. This means one can choose whether to connect to nim-waku, go-waku or all, and triggers aren't messed up.
put this under a separate namespace from current nim-waku test nodes but still under
wakuv2.test
How would that work? If it's a separate fleet it should have a separate fleet name. Like gowaku.test
or something.
then connect to the other nim-waku nodes
By "other nim-waku
nodes" do you mean all the ones we run, or just the specific fleet: wakuv2.test
?
How many hosts does it need? Does it need to be spread across multiple DCs? How beefy should the hosts be?
How many hosts does it need? Does it need to be spread across multiple DCs? How beefy should the hosts be?
I imagine one is enough, it would act as a bridge to let websocket only clients access the waku test fleets. At this stage the only client is a webapp I use for dogfooding. Other users would only use the webapp as a showcase for js-waku.
Like gowaku.test or something.
Yes
just the specific fleet: wakuv2.test?
How many hosts does it need? Does it need to be spread across multiple DCs? How beefy should the hosts be?
1 seems good enough for now
After further review, we need secure websocket support too (wss) to make the most out of a cluster deployment of go-waku.
We are currently tracking that in waku-org/go-waku#21.
Until this is done, then the cluster deployment would not be that useful and if it cannot be done in go then the deployment would involve using a webserver for the TLS layer.
Closing this until waku-org/go-waku#21 is resolved.
Thanks to @richard-ramos who already did the wss investigation on the go side, we know that for now, we have to use a webserver to add the TLS layer as go-libp2p is not ready for wss native support. See waku-org/go-waku#21 (comment) for details.
By any chance, would it be possible to include a webserver (nginx or other) as part of the deployment of go-waku?
@richard-ramos mentinoed that we should be able to setup subdomains for go-waku instances, like go-waku1.status.im, go-waku2.status.im, etc, and use the status.im certificate, that way we could have https, and not have to deal with self signed certificates.
Hm, if possible I'd go for nginx + letsencrypt script. Making it part of status.im is kind of going against the p2p architecture, and these types of design decisions tend to stick around in a bad way. What are the downsdies of self-signed certificates here? From a protocol design POV, status.im should have no privileged position whatsoever.
Hm, if possible I'd go for nginx + letsencrypt script. Making it part of status.im is kind of going against the p2p architecture, and these types of design decisions tend to stick around in a bad way. What are the downsdies of self-signed certificates here? From a protocol design POV, status.im should have no privileged position whatsoever.
self-signed certificate are fine. However, AFAIK, you cannot use letsencrypt for ip certificate. Which means you would need a domain name, is that an option?
Please note that we tried out an ip self-signed certificate, generate via openssl
(@richard-ramos please confirm) and it worked fine without having to install a CA.
Please note that we tried out an ip self-signed certificate, generate via openssl (@richard-ramos please confirm) and it worked fine without having to install a CA.
Yes, I created the self signed certificate using openssl. I was expecting chrome to complain about the validity of the certificate, but surprisingly it worked with no issues.
Please note that I am now investigation the use of websockify directly with nim-waku. This could add SSL+websocket layer to nim-waku, removing the need to deploying go-waku and nginx.
Best to hold off on this present issue for now.
FYI @oskarth @jm-clius @jakubgs I have updated the description for a new strategy.
At the moment I am proposing to use letsencrypt certificates, simply because I find it more straightforward and I already know how to do it.
@oskarth if you feel strongly about the ip certificates, let me know and I can revise the proposal after I test them out myself.
Whatever is easiest for now, the guiding principle should be that we eventually ideally want it to work for end-users running Status desktop, or similar. Alternatively - KISS setup for VPS/spare server.
Does it have to use LetsEncrypt? It would be probably simpler if we just used CloudFlare certs.
Does it have to use LetsEncrypt? It would be probably simpler if we just used CloudFlare certs.
Any valid SSL certificate should do it. Please note that websockify
takes the full chain in the --cert
argument: --cert /etc/letsencrypt/live/<testmachine.statusim.net>/fullchain.pem
so you may have to concatenate the certificates. Let me know if you hit any issue.
Does it have to use LetsEncrypt? It would be probably simpler if we just used CloudFlare certs.
Keep in mind that we want to have this be deployable by random users. CloudFlare might make sense for the specific cluster setup, but in general we want to make this simple to use from a random computer/VPS. Afaik, letsencrypt is more scriptable and doesn't require one to setup an account, etc.
If either works of course that's more convenient.
This repo is for our internal use, and not for external use. I have limited time and CloudFlare setup requires less work which is why I will use it.
Keep in mind that we want to have this be deployable by random users.
I am not sure this is the right place for this discussion. Do you want me to add some documentation on how to use websockify for nim-waku users?
Repo: Right, agree, this is specifically about cluster setup and not generic setup scripts for nim-waku, so probably wrong place to bring this up.
Do you want me to add some documentation on how to use websockify for nim-waku users?
Probably a good idea, since we should optimize for normal users first and cluster second. Can be a basic md doc in docs folder I suppose?
I have deployed a setup where I pass the origin certs directly to Websockify via --cert
and --key
, but that fails with:
EOF occurred in violation of protocol (_ssl.c:1131)
And wsping
returns:
Error: unable to verify the first certificate.
Related issues:
And nothing of any value in them. Maybe Nginx setup makes more sense.
Oh, I see what's happening. This would require nested wildcard certificate with multiple layers of nested domains, which CloudFlare doesn't seem to support, at least not in a format that would work for all hosts, like *.*.*.*.statusim.net
.
So I guess it is LetsEncrypt anyway. But this is useful to know for future setups involving full hostnames.
There's this role, which uses the standalone
method: https://github.com/geerlingguy/ansible-role-certbot
As seen here:
certbot_create_command: >-
{{ certbot_script }} certonly --standalone --noninteractive --agree-tos
--email {{ cert_item.email | default(certbot_admin_email) }}
-d {{ cert_item.domains | join(',') }}
And it appears to work by stopping Nginx temporarily. And they do auto-renewal via cron:
https://github.com/geerlingguy/ansible-role-certbot/blob/master/tasks/renew-cron.yml
I'd prefer a systemd timer. I might do our own role.
The tricky thing is, if I run certbot
on the host the certificates get created with owner as root
, and private key cannot be accessed from the dockremap
user within the container. But if I run certbot
as non-root user it cannot use the 80
and 443
ports in the standalone mode.
An alternative to allowing certbot
to take the 80
and 443
ports is using --manual
and exposing the right URL yourself.
This is possible using the --manual-auth-hook
and --manual-cleanup-hook
flags:
https://certbot.eff.org/docs/using.html#pre-and-post-validation-hooks
But for that to work a non-root user would have to be able to enable and disable an Nginx site config.
Here's the Websockify config without SSL: f4a2674c
I forked geerlingguy/ansible-role-certbot and created https://github.com/status-im/infra-role-certbot, and made some changes:
- status-im/infra-role-certbot@9eb5685b - support running certbot via docker container
- status-im/infra-role-certbot@3d768062 - refactor to drop non-ubuntu OSes, use systemd timer
And then configured the nim-waku
role to make use of the certbot
via Docker, so the certificates are owned by dockremap
:
I've created the certificates using certbot
in a Docker container and yet I'm still getting this error:
EOF occurred in violation of protocol (_ssl.c:1131)
It's always something...
I tried adding py3-openssl py3-asn1 py3-ndg_httpsclient
packages to the Docker image as suggested but it didn't work.
Ah, there we have it, you have to pass fullchain.pem
to the --cert
flag, instead of the cert.pem
from LetsEncrypt:
> ~/temp/node_modules/.bin/wsping wss://node-01.do-ams3.wakuv2.test.statusim.net/
Successfully connected to wss://node-01.do-ams3.wakuv2.test.statusim.net/.
Here's the changes that implement SSL using the new infra-role-certbot: 0de8519e
The Consul definition also includes the multiaddress:
admin@node-01.do-ams3.wakuv2.test:~ % jq '.services[] | select(.name == "nim-waku-v2-websocket").meta' /etc/consul/service_nim_waku_v2.json
{
"node_enode": "/dns4/node-01.do-ams3.wakuv2.test.statusim.net/tcp/443/wss/p2p/16Uiu2HAmPLe7Mzm8TsYUubgCAW1aJoeFScxrLj8ppHFivPo97bUZ"
}
And here's the update to the Python script that generates contents of https://fleets.status.im/: https://github.com/status-im/infra-eth-cluster/commit/9cf182c1
Looks good:
> curl -s https://fleets.status.im/ | jq '.fleets["wakuv2.test"]'
{
"waku": {
"node-01.ac-cn-hongkong-c.wakuv2.test": "/ip4/47.242.210.73/tcp/30303/p2p/16Uiu2HAmSyrYVycqBCWcHyNVQS6zYQcdQbwyov1CDijboVRsQS37",
"node-01.do-ams3.wakuv2.test": "/ip4/134.209.139.210/tcp/30303/p2p/16Uiu2HAmPLe7Mzm8TsYUubgCAW1aJoeFScxrLj8ppHFivPo97bUZ",
"node-01.gc-us-central1-a.wakuv2.test": "/ip4/104.154.239.128/tcp/30303/p2p/16Uiu2HAmJb2e28qLXxT5kZxVUUoJt72EMzNGXB47Rxx5hw3q4YjS"
},
"waku-websocket": {
"node-01.ac-cn-hongkong-c.wakuv2.test": "/dns4/node-01.ac-cn-hongkong-c.wakuv2.test.statusim.net/tcp/443/wss/p2p/16Uiu2HAmSyrYVycqBCWcHyNVQS6zYQcdQbwyov1CDijboVRsQS37",
"node-01.do-ams3.wakuv2.test": "/dns4/node-01.do-ams3.wakuv2.test.statusim.net/tcp/443/wss/p2p/16Uiu2HAmPLe7Mzm8TsYUubgCAW1aJoeFScxrLj8ppHFivPo97bUZ",
"node-01.gc-us-central1-a.wakuv2.test": "/dns4/node-01.gc-us-central1-a.wakuv2.test.statusim.net/tcp/443/wss/p2p/16Uiu2HAmJb2e28qLXxT5kZxVUUoJt72EMzNGXB47Rxx5hw3q4YjS"
}
}
@D4nte please check it out.
Yes, looks great, thanks!
Can you please direct me where I can open an issue to get CORS Header on fleets.status.im so that any dapp can query fleets.status.im to retrieve the addresses?
Well, as you saw in my previous comment the change that added those services to fleets.status.im
was https://github.com/status-im/infra-eth-cluster/commit/9cf182c1 done in the infra-eth-cluster
repo, so that's where an issue would go.
But I already fixed that: https://github.com/status-im/infra-eth-cluster/commit/b3554e59
> curl -sLi fleets.status.im | grep access-control-allow-origin
access-control-allow-origin: *