bug: RLNv2 in dynamic mode imposed rate limit after the first message while 100 configured
Closed this issue · 18 comments
Problem
During test_valid_payloads_dynamic_at_slow_rate execution, 2nd message sent over RLN relay was rejected with error. rln_relay_user_message_limit set to 100.
ERROR tests.relay.test_rln:test_rln.py:128 Payload An integer failed: Error: 400 Client Error: Bad Request for url: http://127.0.0.1:20573/relay/v1/messages/%2Fwaku%2F2%2Frs%2F1%2F0 with response: b'Failed to publish: RLN validation failed'
Impact
Low (revised from High) occurrence, high impact.
To reproduce
- Please checkout waku-org/waku-interop-tests@1eef2fb
- cd waku-interop-tests
- python -m venv .venv
- source .venv/bin/activate
- pip install -r requirements.txt
- pre-commit install
- have credentials file .env ready in the repo root
- pytest tests/relay/test_rln.py -k 'test_valid_payloads_dynamic_at_slow_rate'
Expected behavior
All 100 messages delivered as expected by configuration rln_relay_user_message_limit=100, rln_relay_epoch_sec=600.
Screenshots/logs
test.log
node1_2024-08-01_20-53-13__84dd380f-529e-4576-8330-4c164bfc278e__wakuorg_nwaku:latest.log
node1_2024-08-01_20-53-13__460fe439-ce68-41f4-ae66-5f1cef92b882__wakuorg_nwaku:latest.log
node1_2024-08-01_20-53-25__2bc49263-632e-47b7-b756-48e6a2d367fb__wakuorg_nwaku:latest.log
node2_2024-08-01_20-53-13__84dd380f-529e-4576-8330-4c164bfc278e__wakuorg_nwaku:latest.log
nwaku version/commit hash
NWaku - wakunode2-v0.31.0-rc.1-10-ge4e01f (e4e01fa)
did you generate the keystore with the userMessageLimit set to 100?
did you generate the keystore with the userMessageLimit set to 100?
No. Is there some meaning to tie keystore and operational parameter ?
did you generate the keystore with the userMessageLimit set to 100?
No. Is there some meaning to tie keystore and operational parameter ?
Yes, the membership that is registered while running the keystore generator also accepts the rln-relay-user-message-limit
as part of it
This is then enforced within waku-rln-relay
did you generate the keystore with the userMessageLimit set to 100?
No. Is there some meaning to tie keystore and operational parameter ?
Yes, the membership that is registered while running the keystore generator also accepts the
rln-relay-user-message-limit
as part of itThis is then enforced within waku-rln-relay
How would anyone know if something is enforced when registration went just fine ? Is there anything else of this kind ?
I'd suggest to log warning when default rln-relay-user-message-limit
value was used or ideally we don't need to specify it during registration at all. It is quite unusual to put this into keystore. Maybe it is not a keystore anymore, but some kind of a token, client side of a contract? I cannot see anything else being unusual there, ok.
When I set rln-relay-user-message-limit=100
during registration and then I set rln-relay-user-message-limit=1
for node operation, I will get limited after sending 1 message. It looks bumping default for rln-relay-user-message-limit
to maximum supported message limit might do the trick ? @rymnc WDYT ?
DEBUG tests.relay.test_rln:test_rln.py:125 Sending message No. #2
INFO src.node.api_clients.base_client:base_client.py:37 curl -v -X POST "http://127.0.0.1:42709/relay/v1/messages/%2Fwaku%2F2%2Frs%2F1%2F0" -H "Content-Type: application/json" -d '{"payload": "MTIzNDU2Nzg5MA==", "contentTopic": "/test/1/waku-rln-relay/proto", "timestamp": '$(date +%s%N)'}'
ERROR src.node.api_clients.base_client:base_client.py:16 HTTP error occurred: 500 Server Error: Internal Server Error for url: http://127.0.0.1:42709/relay/v1/messages/%2Fwaku%2F2%2Frs%2F1%2F0. Response content: b'Failed to publish: error appending RLN proof to message: could not get new message id to generate an rln proof: NonceLimitReached: Nonce limit reached. Please wait for the next epoch. requested nonce: 1 & nonceLimit: 1'
ERROR tests.relay.test_rln:test_rln.py:128 Payload An integer failed: Error: 500 Server Error: Internal Server Error for url: http://127.0.0.1:42709/relay/v1/messages/%2Fwaku%2F2%2Frs%2F1%2F0 with response: b'Failed to publish: error appending RLN proof to message: could not get new message id to generate an rln proof: NonceLimitReached: Nonce limit reached. Please wait for the next epoch. requested nonce: 1 & nonceLimit: 1'
Hey @romanzac, going over the issue, what's the current action item for this?
To log the limit when performing the registration?
For TWN, rlnRelayUserMessageLimit
is set to 100. I'm not sure if we should care about this node setting outside of TWN? As RLN is only defined for it
Hey @romanzac, going over the issue, what's the current action item for this? To log the limit when performing the registration?
For TWN,
rlnRelayUserMessageLimit
is set to 100. I'm not sure if we should care about this node setting outside of TWN? As RLN is only defined for it
Hi, may I know what is TWN ? I would like to collect more information first before making a conclusion, together with you, where to make a trade off. To provide message limit during registration is not intuitive, I don't understand yet why was that necessary ? We have RLN registration separated from RLN operation. Providing operational parameter during registration blurs the conceptual boundary. Do we need the registration as a separate step then ?
rlnRelayUserMessageLimit
I found TWN = TheWakuNetworkConf in the code. This is a piece of code convenient for IFT running Waku network. I wonder how these values could live in codebase? And second question is about how many Waku networks do we expect the World would ever run ? If more than 1 Waku network is going to run, I would encourage to remove such values from codebase. And we optimize configuration to many Waku networks which may run in the future. If not for anything, at least to not chose strategies which demonstrate our low confidence in our own work!
Hi, may I know what is TWN ?
Yes, The Waku Network (aka TWN) is the public spam protected network we are building at Waku. Anyone can use Waku on different networks, but TWN is Waku's main and only spam protected network.
how many Waku networks do we expect the World would ever run ?
To my knowledge, there is supposed to be only one Waku network (TWN) with RLN enabled. Waku can be used by individuals as they please on different networks and can setup RLN on their own by deploying their own contracts. But I think that we only care about TWN, RLN for a different network is not easily configurable, and not even sure if it should be.
Please correct me if I'm wrong @jm-clius @alrevuelta
Hi @romanzac the Waku Network is a recommended instantiation of Waku - it is not owned or controlled by anyone, but simply an opinionated "default" that allows for a global, shared, p2p communications infrastructure. Think of it as somewhat analogous to the Ethereum mainnet. It's always possible to fork and build your own isolated network, but for most use cases it is not recommended. The Waku Network is specified (and rationale explained) here: https://github.com/vacp2p/rfc-index/blob/main/waku/standards/core/64/network.md
Hi @romanzac the Waku Network is a recommended instantiation of Waku - it is not owned or controlled by anyone, but simply an opinionated "default" that allows for a global, shared, p2p communications infrastructure. Think of it as somewhat analogous to the Ethereum mainnet. It's always possible to fork and build your own isolated network, but for most use cases it is not recommended. The Waku Network is specified (and rationale explained) here: https://github.com/vacp2p/rfc-index/blob/main/waku/standards/core/64/network.md
There were networks before and after Ethereum which haven't reached adoption. I see we might be actually overconfident in a wrong place on time on adoption curve. I suggest we keep the upside of our work at the maximum by:
- Keep TWN, it is a first of Waku instances and serves anyone who wants to build app only
- Keep Waku source code clean and ready to be modified or adopted by anyone else.
This way we can maximize our chances Waku could get adopted in a way we cannot predict. We keep this open until there is a clarity on where the adoption might go. It is alright to have opinions, but it not the best use of resources, when unforseable applications of Waku are trimmed off too early.
Is it that costly to have clean codebase ready for modifications ?
It seems to me this is just a matter of slightly confusing config items.
-
We provide a convenience tool to register an RLN membership. You provide the contract address, desired rate limit, and some other necessary config and the tool registers a membership at the configured contract and generates a keystore. There is currently only one deployed contract, but the tool already allows anyone to deploy their own RLN contract and use that instead. This tool is separate from the node.
-
We run a node. If you have your own RLN membership you can use this node to publish to whichever Waku configuration, network (private or public) you're connected to (or even choose not to configure and use RLN at all). If you choose to connect to any Waku Network that is RLN-protected, you can then set the registered user rate limit so that you can generate valid proofs to attach to your messages that you publish. Note that whichever network you connect to, whether you use RLN or how RLN is configured, is completely up to you.
-
If you choose to connect your node to TWN (i.e. if you set
clusterId=1
for the only specified network with a deployed RLN contract), for now your user settings will be overridden by what has been specified for the Waku Network. You actively choose to override these settings by choosing the Waku Network.
Ideally we would want users to not even see rlnRelayUserMessageLimit
config option if they choose TWN, but note that this is not possible with nim-config
. We may consider logging a warning if there is a user setting that is overridden when connecting to TWN.
It seems to me this is just a matter of slightly confusing config items.
- We provide a convenience tool to register an RLN membership. You provide the contract address, desired rate limit, and some other necessary config and the tool registers a membership at the configured contract and generates a keystore. There is currently only one deployed contract, but the tool already allows anyone to deploy their own RLN contract and use that instead. This tool is separate from the node.
- We run a node. If you have your own RLN membership you can use this node to publish to whichever Waku configuration, network (private or public) you're connected to (or even choose not to configure and use RLN at all). If you choose to connect to any Waku Network that is RLN-protected, you can then set the registered user rate limit so that you can generate valid proofs to attach to your messages that you publish. Note that whichever network you connect to, whether you use RLN or how RLN is configured, is completely up to you.
- If you choose to connect your node to TWN (i.e. if you set
clusterId=1
for the only specified network with a deployed RLN contract), for now your user settings will be overridden by what has been specified for the Waku Network. You actively choose to override these settings by choosing the Waku Network.Ideally we would want users to not even see
rlnRelayUserMessageLimit
config option if they choose TWN, but note that this is not possible withnim-config
. We may consider logging a warning if there is a user setting that is overridden when connecting to TWN.
Thanks for thorough description. Please note waku-interop-tests don't test with TWN config. Entire test suite is designed from independent node provider point of view. We could log warning as you have proposed. It is unfortunate, from my point of view, if single design cannot satisfy both TWN and independent node provider, but I can only accept your decision.
If your product strategy is affected by instances similar to: "that this is not possible with nim-config
" we should probably escalate this to get org revisit a decision of phasing out Go-Waku. If there is a risk half(?) of the adoption use cases cannot be realized, because Nim ecosystem immaturity, some corrective action would need to happen.
It is unfortunate, from my point of view, if single design cannot satisfy both TWN and independent node provider, but I can only accept your decision.
Not sure where you got that from. @jm-clius has specified:
Note that whichever network you connect to, whether you use RLN or how RLN is configured, is completely up to you.
It is unfortunate, from my point of view, if single design cannot satisfy both TWN and independent node provider, but I can only accept your decision.
Not sure where you got that from. @jm-clius has specified:
Note that whichever network you connect to, whether you use RLN or how RLN is configured, is completely up to you.
My intention is we can do better than "as is; up to you". My concern is about what @jm-clius hasn't specified. I know well now what is possible, but I don't want others who are not building on TWN should have it harder or outright impossible.
I know well now what is possible, but I don't want others who are not building on TWN should have it harder or outright impossible.
We are still in the early phase of external adoption. and RLN is not even on mainnet yet.
If a project wants to use their own RLN contract, we will assist them and do the necessary update. See "Supporting non-TWN RLN deployment" as a feature yet to be done. Let's get RLN on mainnet first, and then we can review what parameters need to be configurable, and what parameters can remain hardcoded.
After this conclusion, is there any action item for this issue or can it be closed?
We can check one by one if the user changed any of the 11 parameters that we override for TWN and log a warning for each that was changed. I personally think it's an overkill but if you think it adds value I can add it :)
After this conclusion, is there any action item for this issue or can it be closed?
We can check one by one if the user changed any of the 11 parameters that we override for TWN and log a warning for each that was changed. I personally think it's an overkill but if you think it adds value I can add it :)
I suggest to close this issue. Interop tests were fixed to accommodate. These tests could serve as living reference for anyone who can build their own network.