nucypher/nucypher

Peer Snitching Protocol

Opened this issue ยท 18 comments

Current DKG protocol doesn't offer any insights on the observed network connectivity of nodes, so it's possible that a node is effectively unreachable, but still follows the DKG protocol. This is due that current protocol only requires blockchain read/write interactions through the Coordinator contract. Although in some cases the situation may be just temporary (e.g., VPS is affected by service provider maintenance), in other cases it may be a signal improper node configuration (e.g. firewall).

I share here a first idea of a protocol that involve nodes snitching on each other, in this case, working completely on top of current DKG ritual:

  • Nodes currently have a pretty clear idea of what other nodes are not reachable or verified, thanks to our P2P Discovery Loop. We will use this information on the next stages.
  • On the posting transcripts stage, nodes report what nodes from the selected cohort are unverified and/or not reachable
  • On the posting aggregations stage, nodes check again on each other if necessary, and provide a new report
  • Snitching reports are cross-checked, and for each node, if there's agreement on more than C % on the reports, the node is added to a list.
  • If this list includes more than a threshold of U nodes, the DKG ritual fails. Otherwise, ritual can continue (this of course implies that some DKG resilience exist and that initiators may sample nodes that may not be reachable).
  • Regardless of DKG status, nodes in the list are penalized somehow (probably not slashing, but maybe a reduction on rewards)

Note that since there's an honest majority assumption (and on top of it, DKG rituals will try to include some anticollusion measures, see #3269 and #3365, which further ensures independently operated nodes), we don't have to worry too much about potential collusions.

Another potential Peer Snitching Protocol can occur completely outside DKGs. Let's assume nodes perform some sort of recurrent heartbeat TX (say every 24 hours). Snitching reports can be added to the heartbeat, which can be cross-checked later and used as uptime evidence.

It's worth noting that, apart from the "punitive" measures about rewards, the fact that snitching reports are on-chain implies they can be used by some operator/staker-facing UIs like the Threshold dashboard, the nucypher client, etc.

Originally proposed as a way to report on uptime during DKG events, this can be extended as the basis of a reputation system for off-chain metrics that can be fed into the protocol. The overall flow is that Peer Snitching Reports lead to a sort of Reputation/Scoring system (which can also use other on-chain metrics like DKG participation faults), which can be used later on for Protocol Inclusion-Exclusion, Rewards Penalties and Slashing.

I'm posting here some of the points discussed at the first peer-snitching meeting on Feb 19th:

We can consider three layers for this task:

  1. Evidence collection: data on node status and failures is gathered.
  2. Scoring system: operators have an associated score that depends on the behavior they perform on the network.
  3. Protocol operations: based on the operators' score, the protocol will decide how to operate.

The outcomes can be used in several fields: operator's slashing, calculation of rewards, and DKG process refining...

For the evidence collection, we can consider that the data can come from two sources: on-chain and off-chain.

  • On-chain: meaningful events can be the failure to post a transcript (so fail DKG ritual), the change of the operator (client)...
  • Off-chain: maybe a contract in which nodes report that a node is not active (although this will cost a bit of MATIC).

ok, so current plan is to have a contract on an L2 (cheap) so that nodes can submit their view of the network state. The simplest example of this would be to have them post the information seen on the status page - verified/unverified nodes. For now, let's have them only post the operator address of the unverified nodes.

2 options at this point - weight each node by stake, or not.

Some considerations:

  • nodes need to be snitching at around the same time, so there should be an open period (say 1 hour) where they can submit.
  • something needs to trigger a vote. This could be an event from the contract which is emitted (there are services for scheduling contract calls) or it could be hard coded within the node - ie every Wednesday at 0300 UTC.

This is easy to game. Once you know the schedule, or listen to the event, a node can come online for the hour to ensure it doesn't get snitched on. There is also no real way to know if a node is lying, we assume that the honest majority assumption holds.

We could use Chainlink VRF (Verifiable Random Function) to generate a random number that determines the start of the reporting window. This makes it much harder to game - a user will struggle to start/stop their nodes with such short notice

This is effectively a vote by the nodes. We could take the stake-weighted vote idea further:

  • randomly select nodes until 51% of total stake is selected, then calculate results
  • randomly select one node (based on stake weight) and use them as "truth"
  • similar to tBTC, randomly select nodes (with replacement) based on stake weight until we have 25/50/100 votes

This reduces the threat of sybil attacks - you can't affect a vote by merely having a massive stake or many nodes. You also need the random selection to be on your side.

Should we punish nodes that fall on the wrong side of the vote? should we reward the nodes that fall on the correct side of the vote?

A simple implementation would be:

  1. Withhold rewards for the nodes who are deemed to be inactive for that period. If voting happens once per week, then one weeks worth of rewards is withheld.
  2. Distribute the withheld rewards between all nodes voted on the "correct" side.

Nodes who voted "incorrectly" are not punished, but they miss out on some rewards.

the "negative proof" problem is tricky. instead, what if each node submitted evidence that it IS online, rather than snitching on each other??

  1. Initiation of Submission Period
    The L2 smart contract randomly initiates a submission period. This randomness ensures that nodes cannot predict when they need to be online. The contract publishes a unique nonce or the current block number. Using a nonce might offer better randomness and security, as it's specifically generated for this event, whereas a block number could be predicted.

  2. Evidence Collection
    Each node is responsible for proving its online status by interacting with other nodes. Upon receiving a request, a node evaluates whether the requesting peer is online (based on successful communication) and signs a message containing the peer's address and the nonce/block number. This signed message serves as evidence of the requester's online status.

  3. Submission of Evidence
    A node must collect enough evidence to meet a certain threshold before submitting to the contract. This could be a simple majority of its peers or a weighted majority based on stake.
    To reduce transaction costs, a node aggregates all collected signatures into a single transaction. Could we use merkle trees to compress this further? should we store this on ipfs/arweave - we probably need to test and see the costs

  4. Verification and Recording
    Upon receiving a submission, the smart contract verifies the aggregated evidence. This includes checking the signatures' validity, ensuring the nonce/block number matches the current submission period, and confirming that the evidence meets the required threshold.

Any nodes who do not submit sufficient evidence within the defined period will lose their rewards.

Great start! Some thoughts and doubts about this proposal:

2 options at this point - weight each node by stake, or not.

What do you mean here? To weigh the voting power of each node? or to weigh the possibility of being elected for voting?

A simple implementation would be: 1. Withhold rewards for the nodes who are deemed to be inactive for that period. If voting happens once per week, then one weeks worth of rewards is withheld.

Heads-up that (if I understood correctly), this can be tricky to implement taking into account that you will be able to withdraw your rewards from TACo Application with a resolution of seconds, so it could be tricky to withhold the rewards (at least, more difficult that if you are calculating the rewards monthly).


About the alternative to snitching (each node posting its own evidence):

Indeed it looks to be simpler, but

  1. Evidence Collection

My concern here is that the evidence collected through interactions doesn't look like a pretty good sampling method to me. What happens with the less active nodes (those that are not sending requests)? Also, it occurs to me that a possible attack could be to select low-activity nodes and send malicious reports of them, so the probability of this victim node being punished is high. Can we avoid this kind of situation?


Additionally, another concern I have is that we should be very cautious about the cost of this peer-snitching protocol: it doesn't make sense to have the nodes being charged constantly with fees due to the submission of reports, so we should measure pretty well the cost of this and put in a balance the probability and the cost of nodes being dishonest and the cost for the whole protocol of verifying the nodes are working honestly.

Heads-up that (if I understood correctly), this can be tricky to implement taking into account that you will be able to withdraw your rewards from TACo Application with a resolution of seconds, so it could be tricky to withhold the rewards (at least, more difficult that if you are calculating the rewards monthly).

We can decrease proportion of reward for that punishment. For example, if node authorized 1000 T, after node set public key we use all 1000 for calculating portion of reward. In that case we can decrease 1000 T to any amount (let's say 900 T) that will affect only reward calculation. Staker still will have 1000 T.

My concern here is that the evidence collected through interactions doesn't look like a pretty good sampling method to me. What happens with the less active nodes (those that are not sending requests)?

Can you explain further? What do you mean by less active node?

When it comes to evidence collection - each node must collect it's own evidence, by sending requests to it's peers. If you don't do that, then you're not running properly, and you'll be punished.

Also, it occurs to me that a possible attack could be to select low-activity nodes and send malicious reports of them, so the probability of this victim node being punished is high. Can we avoid this kind of situation?

I don't really understand, there's no selection happening. And nodes aren't sending reports for each other, they're collecting them themselves. If you send a request for evidence to a peer, and they either don't respond or send you back rubbish, then you move on and collect evidence elsewhere. You just need to collect enough to reach the threshold of 51% of total staked T (or something similar)

When it comes to evidence collection - each node must collect it's own evidence, by sending requests to it's peers. If you don't do that, then you're not running properly, and you'll be punished.

I see. When you mention "interaction with other nodes" I wrongly interpreted the interaction that occurs in TACo protocol (decryption requests, etc). Now your proposal looks clearer to me.

the "negative proof" problem is tricky. instead, what if each node submitted evidence that it IS online, rather than snitching on each other??

  1. Initiation of Submission Period
    The L2 smart contract randomly initiates a submission period. This randomness ensures that nodes cannot predict when they need to be online. The contract publishes a unique nonce or the current block number. Using a nonce might offer better randomness and security, as it's specifically generated for this event, whereas a block number could be predicted.

What about using blockhashes? This way there's no need for anyone triggering the event, and you could do it in a probabilistic way, for example:

For every N-th block, check its blockhash H_i. If H_i mod M < T, then a new submission period just started; otherwise, simply ignore.

By tuning N, M and T you can determine the expected frequency of new submissions. For example, assuming an average block production time of 2 seconds (like in Polygon Mainnet), then N = 200, M = 216 and T = 1 should imply a new submission per day (on average), since 200 blocks are ~400 seconds, and there are 216 slots of 400 seconds in a day. More generally, for expected frequency F (in seconds), N can be freely chosen and parameters M and T just need to satisfy F * T = 2 * M * N. Choosing a N below 256 is interesting because the most recent 256 blockhashes are available to smart contracts

This of course assumes that nodes can't predict the next blockhashes, but I think it's a safe assumption, and anyway, doesn't seem like a relevant threat for our scenario.

  1. Evidence Collection
    Each node is responsible for proving its online status by interacting with other nodes. Upon receiving a request, a node evaluates whether the requesting peer is online (based on successful communication) and signs a message containing the peer's address and the nonce/block number. This signed message serves as evidence of the requester's online status.
  2. Submission of Evidence
    A node must collect enough evidence to meet a certain threshold before submitting to the contract. This could be a simple majority of its peers or a weighted majority based on stake.

Both simple majority and weighted majority can be obtained from the info in TACo (child) app contract.

To reduce transaction costs, a node aggregates all collected signatures into a single transaction. Could we use merkle trees to compress this further? should we store this on ipfs/arweave - we probably need to test and see the costs

Perhaps it'd be useful to explore BLS signatures since they can be aggregated into a single signatures, which makes sense in this case since the message to sign is always the same (i.e., peer's address and nonce/blockhash). This however poses other challenges (aggregating the public keys, necessary precompiles, etc)

  1. Verification and Recording
    Upon receiving a submission, the smart contract verifies the aggregated evidence. This includes checking the signatures' validity, ensuring the nonce/block number matches the current submission period, and confirming that the evidence meets the required threshold.

Any nodes who do not submit sufficient evidence within the defined period will lose their rewards.

For every N-th block, check its blockhash H_i. If H_i mod M < T, then a new submission period just started; otherwise, simply ignore.

This is a cool idea ๐Ÿ˜Ž

Perhaps it'd be useful to explore BLS signatures since they can be aggregated into a single signatures, which makes sense in this case since the message to sign is always the same (i.e., peer's address and nonce/blockhash). This however poses other challenges (aggregating the public keys, necessary precompiles, etc)

Nice, i'll do some research into which pre-compiles are available.

Basic idea of how the smart contract would look is now here nucypher/nucypher-contracts#248

@cygnusv I spoke with @piotr-roslaniec and he suggested looking into BN254 which has support for aggregated signatures and has precompiles available on EVMs.

Some info here https://hackmd.io/@liangcc/bls-solidity

In particular, BN254 beats ECDSA when verifying 38+ signatures. If you want to store those signatures, then it obviously beats ECDSA. I don't know if we want to store those signatures, or just the result. We could instead emit an event when a node submits evidence.

Both simple majority and weighted majority can be obtained from the info in TACo (child) app contract.

By properly utilising the above, we might not need too many signatures for each node. I need to look at stake distributions first.

Some more shower thoughts...

As the adopter set and usage grows, the collusion risk also grows. Take Bqeth, they're doing crypto inheritance which means large sums of money are up for grabs. Due to the fact that we use a new cohort per adopter, it's easy to find out the set of entities that are responsible for anything Bqeth encrypts - and that group could very well choose to collude, knowing that there is a good chance of decrypting a high value payload.

How do we guard against this? We need to be sending decryption requests for all active rituals that are indistinguishable from standard user decryption requests. Currently this isn't possible because of the EncryptorAllowList. If we added a Peer Snitching address to the allow list the nodes would able to adjust their behavior according to who had signed the encryption.

I think we should look at enabling Ring Signatures at the allow list level. That way the encryptor's identity remains hidden but they can still be validated. This is possible something that would be adopter specific and enabled on a per cohort basis when greater security (or privacy) is required.

The Peer Snitching would then include a new component: at random time interval t (use a Poisson Distribution to keep things regular yet random) a decryption request is sent for relevant Rituals. Any node that responds incorrectly (in either direction) can then be penalized. Penalties occur for every infraction. Penalties could be different for the two scenarios:

  • not decrypting when conditions are satisfied - low penalty due to low impact
  • decrypting when conditions aren't satisfied - high penalty due to high impact

Interesting idea! My first thought:

The Peer Snitching would then include a new component: at random time interval t (...) a decryption request is sent for relevant Rituals.

Sent by whom? How can ring signatures prevent encryptor's free-riding (which was the motivation to create encryptors' allowlists in the first place)?

The ring signature would enable a node to determine that someone from the allow list signed the encryption, just not who exactly. In our case, the Ring would always be the full set of allowed encryptors.

Sent by whom?

good question, I'd imagine that whatever setup we end up with for automatic rewards distribution would also be valid here