bug: RLNv2 rate limit imposed sooner than expected

Question

bug: RLNv2 rate limit imposed sooner than expected

Opened this issue 3 months ago · 8 comments

Problem

During interop test_valid_payloads_at_slow_rate execution, 20th message sent over LRN relay was rejected with error

ERROR    tests.relay.test_rln:test_rln.py:32 Payload A very long string failed: Error: 400 Client Error: Bad Request for url: http://127.0.0.1:63912/relay/v1/messages/%2Fwaku%2F2%2Frs%2F1%2F0 with response: b'Failed to publish: RLN validation failed'

Impact

Low (revised from High) occurrence, high impact.

To reproduce

Please checkout waku-org/waku-interop-tests@6b10c18
cd waku-interop-tests
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
pre-commit install
pytest tests/relay/test_rln.py -k 'test_valid_payloads_at_slow_rate'

Expected behavior

All 100 messages delivered as expected by configuration rln_relay_user_message_limit=100, rln_relay_epoch_sec=600.

Screenshots/logs

test.log
node1_2024-07-31_18-06-09__bb5caa43-2be1-4c92-b006-c674bb4c2814__wakuorg_nwaku:latest.log
node1_2024-07-31_18-06-09__eba792d6-3fc8-4662-8afe-2c127d016536__wakuorg_nwaku:latest.log
node1_2024-07-31_18-06-24__42b05617-7bc8-4300-bd23-317e2a311bfc__wakuorg_nwaku:latest.log
node2_2024-07-31_18-06-09__eba792d6-3fc8-4662-8afe-2c127d016536__wakuorg_nwaku:latest.log

nwaku version/commit hash

NWaku - wakunode2-v0.31.0-rc.1-10-ge4e01f (e4e01fa)

Answer 1 · 2024-07-31T12:07:29.000Z

Thanks, I managed to reproduce it with the interop tests.

The problem is here:

WRN 2024-07-31 11:59:51.671+00:00 invalid message: invalid proof topics="waku rln_relay" tid=1 file=rln_relay.nim:240 payloadLen=26

https://github.com/waku-org/nwaku/blob/v0.31.1-rc.0/waku/waku_rln_relay/rln_relay.nim#L240

I have seen everything in the past (invalid root, spam, etc) but not sure under which circumstances a proof can be invalid, assuming honest nwaku nodes. Quite interesting I can confirm it happens always at message index 20. I have even tried modifying the content of such message, but still the same problem.

Will further investigate.

Answer 2 · 2024-07-31T12:39:51.000Z

Seems the issue is the following:

Since rln-relay-dynamic is not set, "static RLN" is used. Which is a hardcoded set of memberships where 20 is hardcoded as the max limit.

Solution:

Use rln-relay-dynamic=true to use the onchain RLN version, which allows to configure parameters such as user message limit.
Or if you want to use static RLN with a configurable user message limit, we would need to add support for this, since currently is hardcoded to 20.

Changes for it to work with onchain rln aka rln-relay-dynamic=true

Answer 3 · 2024-07-31T12:47:44.000Z

If you want to have a static RLN with configurable message size, we can add a flag for this:

https://github.com/waku-org/nwaku/blob/v0.31.1-rc.0/waku/waku_rln_relay/group_manager/static/group_manager.nim#L36

(right now is hardcoded to 20)

Answer 4 · 2024-07-31T12:50:47.000Z

To close this issue as well:

This would have been prevented if nwaku errored when providing both a contract/eth-endpoint and using static RLN. This configuration does not make sense and should fail.

Answer 5 · 2024-07-31T13:01:12.000Z

To close this issue as well:

This would have been prevented if nwaku errored when providing both a contract/eth-endpoint and using static RLN. This configuration does not make sense and should fail.

I agree with your findings. May I propose we re-qualify this issue to improvement ? In the meantime, I will change my tests to filter out unnecessary params.

Answer 6 · 2024-07-31T13:05:37.000Z

If you want to have a static RLN with configurable message size, we can add a flag for this:

https://github.com/waku-org/nwaku/blob/v0.31.1-rc.0/waku/waku_rln_relay/group_manager/static/group_manager.nim#L36

(right now is hardcoded to 20)

This proposal will self document itself. Looks good to me.

Answer 7 · 2024-08-09T12:56:28.000Z

So to be sure, we should:

1- return an error if a node is run with static RLN + providing a contract/endpoint
2- make static RLN message limit configurable

Is it correct? @romanzac

Answer 8 · 2024-08-09T14:29:18.000Z

So to be sure, we should:

1- return an error if a node is run with static RLN + providing a contract/endpoint 2- make static RLN message limit configurable

Is it correct? @romanzac

Yes, if static RLN is not useful for troubleshooting of dynamic RLN normally used in production. When there is an additional need to remove config params while troubleshooting, it adds work. I would suggest warning message and ignore the contract/endpoint. Plus make static RLN message limit configurable.