[Guardrails] Probabilistic Proofs - Model Claim and Stake Data for Parameter Thresholds

Question

[Guardrails] Probabilistic Proofs - Model Claim and Stake Data for Parameter Thresholds

jessicadaugherty opened this issue 2 years ago · 14 comments

Objective

Build a data model using claims and stakes to set a ProbabilityOfProofRequest parameter value that proves with 99% confidence that servicers cannot claim more POKT than they risk burning.

This model should set the parameter value for ProofRequiredThreshold as well by using stake data against ProbabilityOfProofRequest which will enable us to both validate the proposed solution and provide the foundations for calculating how much additional scale can be achieved with these changes.

Origin Document

The V0 growth budget has been documented extensively:

These documents and measurements have resulted in a proposed scaling solution called Probabilistic Proofs:

Probabilistic Proofs Proposal

Goals

Determine the values for ProbabilityOfProofRequest and ProofRequiredThreshold parameters
Test and validate the values using on chain data (claims and stakes)
Collect data on % of block space consumed by proofs and claims to determine new growth budget

Deliverable

Build a negative binomial distribution model and determine the initial value for ProbabilityOfProofRequest parameter using a minimum of 1 week of claims data
Iterate ProbabilityOfProofRequest value over stake data and determine the initial value for ProofRequiredThreshold

Non-goals / Non-deliverables

Implementing the scaling solution

General issue deliverables

Update the appropriate CHANGELOG
Update any relevant READMEs (local and/or global)

Creator: @jessicadaugherty
Co-owner: @Olshansk

Answer 1 · 2023-02-23T01:59:49.000Z

A WIP for this document can be found at #1526. For convenience, here is a direct link to the rendered markdown file.

@RawthiL At your convenience, could you review the document and let me know what you think? Primarily, I'm looking for feedback on the following:

Values selected
Rationale for the approach of value selection
Model selection for the probability distribution
Definitions for Success and Failure for the model

I spent a lot of time thinking about the correct definitions and approach here and really tried to see how/where negative binomial distributions could fit in, as we discussed offline. I put together this blog post while thinking through it but ultimately went with an alternate approach.

cc @jessicadaugherty

Answer 2 · 2023-02-23T12:51:51.000Z

Great work and nice blog post!
I made some comments but it seems to be in the right track, the selected distribution is correct.
I think that changing the approach from using the PDF to the CDF you could follow almost the same rationale and get a more accurate result. Instead of aiming to a 1% chance of getting k failures you should change that to aim for "99% of the cases will have less than k failures".
I will try to get the values for \mu and \sigma from the transactions data.
The definitions of success and failure is only a convention, I like it how you presented since it translates directly into distribution parameters.

I will default to the standard excuse of any statistics guy =P

this distribution (the exponential) is a particular case of the one I suggested (negative binomial)

Answer 3 · 2023-02-23T14:52:04.000Z

I will give you some info here, if you prefer I can make a repport or repo to share the calculations.

The distribution of requested pokt by claim is a mess (as expected). Here is a sample, of the top 4 chains for clarity:

I integrated it numerically to obtain the CDF and then I calculated the thresholds. The following table shows the amount of claims that will be below the given POKT value. This should guide the election of the ProofRequirementThreshold:

Percentage	Limit
25.0	1.010101
50.0	2.525253
75.0	5.555556
90.0	16.161616
95.0	18.181818

Here are some graphs showing where these thresholds lie in the CDF and PDF of pokt by claim:

Regarding the number of failed claims (for any reason) in the current network, the percentage is really small. In more than 500k claims only the 0.43% ended up failing. There a minimal risk of being slashed if you do things right.

Finally, regarding the % of the block that's occupied by proofs, we don't have the data, we are a little skeptical that we will be able to get it. We will tell you latter when we have a final thought on this...

Answer 4 · 2023-02-24T03:47:06.000Z

Great work and nice blog post! I made some comments but it seems to be in the right track, the selected distribution is correct. I think that changing the approach from using the PDF to the CDF you could follow almost the same rationale and get a more accurate result. Instead of aiming to a 1% chance of getting k failures you should change that to aim for "99% of the cases will have less than k failures". I will try to get the values for \mu and \sigma from the transactions data. The definitions of success and failure is only a convention, I like it how you presented since it translates directly into distribution parameters.

I will default to the standard excuse of any statistics guy =P

this distribution (the exponential) is a particular case of the one I suggested (negative binomial)

Thank you! Also, really appreciate the initial direction you provided as well as all the detailed feedback.

I left a follow up thought/comment here: #1526 (comment)

Answer 5 · 2023-02-24T04:25:27.000Z

@RawthiL Following up on the distribution data you provided. That was exactly what I was looking for and appreciate the clarity. If you push it to a gist, I can append it to our spec?

Side note: interesting how the claim distribution is bimodal to some degree.

Here are the final values using that outcome:

Answer 6 · 2023-02-24T14:23:40.000Z

If you push it to a gist, I can append it to our spec?

No idea how that works, but it would require the data to be somewhere. (gist cannot hold data files I think)

The ProofRequirementThreshold should be updated regularly, as it can vary for many reasons:

Updates of RTTM parameter (this changes weekly)
Changes in the balance of stake in servicers nodes. If more nodes are staked with 60K POKT the distribution will shift upwards. (but this is not likely to affect much).
Addition of new chains / changes in the relays volumes. If a chain starts to push too many relays it will likely to translate into higher claims (as they pack more relays).
Should this proposal include a mechanism to calculate the threshold and update it on a regular basis?
Moreover, the values that I calculated are obtained analyzing 100 sessions (400 blocks) from height 85981 to height 86381. In a few weeks, when SER kicks in, they will surely change.

Careful, k = 16 does not represents 99.99 %. In probability when reaching limits, the numbers grow really fast:
99% -> k=16
99.9% -> k=24
99.99% -> k=32
This is why security with 99.99% (like atomic applications) is so expensive...

Side note: interesting how the claim distribution is bimodal to some degree.

Fun answer, to me is like the CS-137 spectrum
This "binomial" is the effect of the CP paying bad nodes for measuring them. Before GeoMesh the network-wide distribution was almost an exponential, the second lobe of higher rewards was completely diluted by the low rewards. This was one of the reasons why I think PIP-22 was not (completely) fit in that time.

Answer 7 · 2023-02-24T23:14:13.000Z

No idea how that works, but it would require the data to be somewhere. (gist cannot hold data files I think)

Maybe a google drive?

Careful, k = 16 does not represents 99.99 %. In probability when reaching limits, the numbers grow really fast:

Thanks for calling that out. It was a typo on my end. I intended 2 nines confidence, not 2 decimal nines 😅

Fun answer, to me is like the CS-137 spectrum

TIL! Gives some context to the Chornobyl accident and how quickly things degraded.

This "binomial" is the effect of the CP paying bad nodes for measuring them.

Interesting observation.

Before GeoMesh the network-wide distribution was almost an exponential, the second lobe of higher rewards was completely diluted by the low rewards.

Good context. I'll keep this in mind when reading this.

Re ProofRequirementThreshold - I'd like to pick out values that are sufficient given a one-time analysis and only need to be reviewed every couple of months or when very large shifts in the network happen.

I read all of the points you provided but would like to emphasize that this is a feature with a mid (until V1 launches) lifespan so we're aiming for "good enough with low maintenance" rather than "great".

The cost of not implementing it (assuming the network grows) is to hinder the network's capacity at ~3B. The tradeoff of a good enough solution and limiting the network's growth is worth it, but we don't want too invest too much of the team's time.

For example, the following would trigger a review of the parameter:

Large changes in inflation
Large changes in relays
Large changes in chains supported
2 months have passed

If none of these come into play, I believe penalty/reward/scaling tradeoff is worthwhile given the numbers we have selected. Does that approach make sense?

Answer 8 · 2023-02-27T20:53:17.000Z

Here is the Gist with all the data
Using the available info and data from before block 86500 (after that there was a change in the number of Apps in the network), I obtained that the bock TXs portion is composed of:

Proofs : 67.06 %
Claims : 21.64 %
This is good I think...

Regarding the review of the parameters, its OK the low maintenance mode, we can follow the guidelines you provided.

Answer 9 · 2023-02-27T20:58:29.000Z

Just noticed the author names, my name is:

Ramiro Rodríguez Colmeiro

(double last name, common in the southern lands)

Answer 10 · 2023-02-28T22:31:01.000Z

(double last name, common in the southern lands)

@RawthiL I've updated your name in the document :)

Regarding the review of the parameters, its OK the low maintenance mode, we can follow the guidelines you provided.

Awesome, we're on the same page 👌

Proofs : 67.06 %
Claims : 21.64 %
This is good I think...

Yes, this is perfect. It vindicates the approach with data.

I'll be at ETH Denver, so if I don't get to it this week, I'll put the finishing touches (include you're data, improve the flow, etc...) on the document next week.

Also, I appreciate all your help with this. The data, the feedback, the tips, etc. I learnt a ton going through this exercise and looking forward to many more similar ones in v1. 🙏 💯

Answer 11 · 2023-03-08T01:43:36.000Z

Hi everyone,

If the goal is to fit more claims and proofs into a block, wouldn't it be more simplistic to investigate and experiment with increasing the max block size as a stop-gap solution to scale out V0?

Answer 12 · 2023-03-08T03:17:50.000Z

Hey @PoktBlade - thanks for chiming in. I'd love to see that data as well before committing to this implementation. Details about the block size experiment can be found here: #1503

Answer 13 · 2023-03-09T00:14:31.000Z

Is the idea to gather details about both implementations (block size vs probabilistic proofs) and then make a decision on how to scale v0? Just top of mind, researching block size appears to be the lowest friction route. I'll move my convo over there.

Answer 14 · 2023-03-09T01:33:21.000Z

That was the plan, exactly @PoktBlade! A goal was to automate some of these measurements too so we could keep growth budget data up to date. Although I am not sure how important of a goal that is anymore as I don't know how much of that is applicable to V1 scalability.