pokt-network/pocket-core

[RFP/Improvement] Allowing for configurable session block height tolerance

nodiesBlade opened this issue · 6 comments

Please explain the goal of the RFP.
Pocket Core should allow for an allowance/tolerance of n (configurable) session block difference when validating relays.

Please provide a justification for your RFP.

Currently, the client requires that the dispatch nodes (Portal) and servicer nodes be on the same "latest" block height (defintion of latest here) before servicing a relay. If both nodes are not synced on same session block, the servicer will return invalid block height errors. While this can help with synchronization attacks, it has caused numerous relays to be returned as invalid and affects our app's perceived QoS as seen as part of these poktscan errors:
Screen Shot 2022-07-31 at 1 14 12 PM

Scenario

Dispatch node A is lagging behind (SessionBlockHeight: 1)
Servicer node B is able to stay synced (SessionBlockHeight: 5).

Since Pocket node now has a different "latest" session block height, it will no longer service incoming relays from apps that use dispatch node A.

Please provide a success criteria for proposals responding to this RFP.

  • Allows for wiggle room on session block height when validating a relay
  • Configurable via client setting, default value should be at 1 session block tolerance

Add aditional context regarding this RFP.
I would add onto this, session block height as a form of time synchronization is as not as effective, so the ask to change this behavior and the implied risks seems low. If we allow for a tolerance of ~1 session block height difference, we're allowing allowing for the time window to be increased to 8 blocks instead of 4.

I think we can take it up a notch and minimize this window further by allowing for a flexible sessionBlockHeightTolerance, resulting in a time window increase of one block.

Psuedo code:

If latestBlockHeight % 4 == 0 then
           sessionBlockHeightTolerance = 1
else
          sessionBlockHeightTolerance = 0

By doing this, we allow for dispatch nodes or pocket nodes to catch up within a single block before making the tolerance more stricter.

This issue has been mentioned on Pocket Network Forum. There might be relevant details there:

https://forum.pokt.network/t/federating-our-altruist-network-and-servicer-nodes/3958/6

@PoktBlade This thread was brought to my attention so I wanted to follow up and ask a few things for my personal understanding. I understand that session fuzzing has been an ongoing issue.

Defining the Problem

Q1: Just looking for a 👍 that this diagram captures the problem you're outlining?

I looked at how the session block height is computed and see that it's always floored to the nearest % 4.

sequenceDiagram
   title "Session Fuzzing"

   participant A as App
   participant G as Gateway
   participant D as Dispatcher   
   participant S as Servicer<br>block_height=100<br>sess_height=97

   A ->> G: relay
   alt distpatcher_height=99
       G ->> +D: dispatch
       D ->> -G: [Servicer]<br>sess_height=97
       G ->> +S: request(sess_height=97)
       S -->> S: request.sess_height == servicer.sess_height
       S ->> -G: response(200)
       G ->> A: response(200)
   else distpatcher_height=96
       G ->> +D: dispatch
       D ->> -G: [Servicer]<br>sess_height=93
       G ->> +S: request(sess_height=93)
       S -->> S: request.sess_height != servicer.sess_height
       S ->> -G: response(CodeInvalidBlockHeightError= 60)
       G ->> A: response(500)
   end
Loading

Potential Implementation

Q2: Since you have spent more time looking at this, I was wondering if you see any issues here?

I saw the implementation you proposed, but after inspecting the code, was wondering if you see any issues with changing this line to:

    sessionFuzzTolerance := math.Abs((r.Proof.SessionBlockHeight - sessionBlockHeight)/numBlocksPerSession)
	if sessionFuzzTolerance > 1 {
		return sdk.ZeroInt(), NewInvalidBlockHeightError(ModuleName)
	}

Pros:

  • We hard-code to a fuzz of 1 session
  • No configs
  • Literally a 2 line change

Client Configuration

Q3: If you can provide some code pointers, it would really help.

Configurable via client setting, default value should be at 1 session block tolerance

I don't fully understand how you intended for this to be client configurable:

  1. How is the config propagated?
  2. Where would this config be set?
  3. Who'd be setting it?

Other

For testing, the e2e test suite would need to be modified but could potentially be extended to test this.

Hey @Olshansk

Q1:

In general 👍 . At first, I was confused about where fuzzing applies in all of this but makes sense now. Still important to note most invalid height sessions are due to the asynchronous fashion of p2p propagation when a new block is produced. So very likely, the average session block difference will be 1.

Q2:

Your above solution should work as well for validating a relay. But gut feeling tells me there was another line of code you might have to change. I can't recall but will get back to you if I do remember.

Q3:

It's important to note that relay validation is done on mostly on the servicer (client) side, only. The only network enforcement that servicers should pay attention to is the ClaimSubmissionWindow on when to submit their claim.

So ultimately, it is up to the servicer to decide the rules by which they want to service a relay. Economically, it is in their best interest to serve as many relays as they can be rewarded for.

The configuration should be set where most pocket client configuration lives - PocketConfig struct. Actually, there already exists a client configuration that follows a similar validation called ClientBlockSyncAllowance. However, this validation is only for tolerance for block heights differences, not session heights. You can read more about the block height validation through the RelayMeta Validate function.

FYI: So if the client is configured to allow for a session block tolerance of 1 for relay validation, and we know that the network enforces that claims are submitted within 3 session blocks, then the nodes need to be configured to leave some buffer room before they submit a claim so that way they can continue to service old session relays. If they submit a claim for the old session, and the gateway is still sending a relay on the old session block, the app users will receive an EvidenceSealed Error because once a claim is submitted, the evidence is sealed (1, 2).

So what to actually change?

  • Add your LoC suggestion
  • Make sure claims are not submitted too early, otherwise it will be sealed.

Note 1: Not really familiar with the E2E suite.
Note 2: I'm still trying to recall if there was anything else to change, but once again, I'll get back to you if I remember.

Some other context on why this check exists:

During the V0 contributor calls, I did ask about this change. Andrew mentioned that the intended reason behind this check was to prevent applications from sending a "valid but long ago session" relay and then later submitting a challenge request maliciously. However, IMO a one-block difference is "worth it" because:

  1. gives enough time for the network to propagate and decrease the errors our app users gets.
  2. the app side is permission and does not send challenge requests today.
  3. A timebox solution doesn't really solve the problem, just lessens the surface area, and afaik it's going away in V1 :)

This issue has been mentioned on Pocket Network Forum. There might be relevant details there:

https://forum.pokt.network/t/pre-propoal-dan-distributing-the-altruist-network-community-chains-introduction/4020/1

This issue has been mentioned on Pocket Network Forum. There might be relevant details there:

https://forum.pokt.network/t/pip-28-dan-distributing-the-altruist-network-community-chains-introduction/4020/59

Completed as part of the RC 0.10 release