Gossipsub Dynamic Message Diffusion
Opened this issue · 15 comments
Problem
As per the current Gossipsub spec, messages are pushed to all peers within the peer’s mesh, defined by the variable D
, the mesh size. Inevitably, this results in duplicate messages, which is generally expected and a somewhat desirable property to defend against malicious behaviour or attacks. At the same time, the number of duplicates should be limited to the extent possible to avoid overloading nodes.
Intuitively, given that the rate of diffusion stays stable throughout the 4-second propagation window, we expect to see more duplicates towards the end of the window, when the message has already propagated to the majority of the network.
Solution Spectrum
Approach 1: In order to reduce the amount of duplicates as a result of excessive diffusion at the end of the 4-second window, we propose that dynamic diffusion should be considered. According to the dynamic diffusion scheme, more aggressive propagation of messages takes place at the beginning of the window and slower propagation towards the end. For example, a message is propagated to all D
peers of a peer’s mesh during the first two seconds and to D/2
peers during the latter two seconds of the window. However, setting diffusion parameters based on time doesn’t sound like a great choice - makes things rigid and tied to clocks of different nodes in the network, which is generally not a great idea.
Approach 2: An alternative way of defining the aggressiveness of message diffusion is through the number of hops that a message has propagated, e.g., a message is pushed to: D
nodes when hop_count < 2
from the message producer, D/2
nodes when 2 < hop_count < 4
and D/4
when hop_count > 4
. This, however, means that the message/block producer becomes known to its immediate peers that see hop_count = 0
, although there could be techniques to obfuscate that a little bit.
Approach 3: A much simpler alternative is to just reduce D
from the current default of 8
down to 6
. This approach won’t solve the issue of more duplicates towards the end of the propagation window (i.e., it’s not dynamic), but will reduce the diffusion velocity throughout the 4-second window and as a result the amount of duplicates throughout. From the Block Arrival times seen at ProbeLab’s dashboard [link], there seems to be space to reduce aggressiveness (and duplicates) and (inevitably) increase block propagation time - we see both mean and median block arrival times well below the 4 second mark, hovering on average around 2.1-2.3 secs.
The purpose of this issue is to gather feedback on whether investigating this further is desirable, as well as ideas on how to formulate it better (e.g., based on hop count or some other parameter).
[Initial ideas in this issue have been discussed with @cortze @AgeManning @cskiraly and others during Devcon 7.]
I would like to oppose Approach 2, as it completely eliminates anonymity of validators.
Some people said there is already no anonymity in the p2p network, so we don't have to worry about it. That's not true. The only way to guess the IP addresses of validators is to observe the message receiving time in the GossipSub topics, which reduces the anonymity set, but doesn't give you the exact IP addresses.
But with Approach 2, you will know the exact IP addresses.
Why we care about anonymity?
Some people may only think of the DoS attack potential. That's not actually the only problem. IP addresses tell "who you are". For example, if I solo-stake and later withdraw the money and transfer it to an EOA, I would lose the pseudonymity property of the EOA automatically, because you can link the EOA address to "who you are".
there could be techniques to obfuscate that a little bit.
I think if you have some techniques in mind, it will be helpful to list them here.
and (inevitably) increase block propagation time
Currently we use the block propagation time as a metric to see if the network is doing well or not. See [1].
If Approach 3 increases the block propagation time, it will contradict to the metric. So it means the approach is not good.
Remember that the goal is not to reduce the number of duplicates but the goal is to make everyone receive the block/blobs as fast as possible with a larger number of blobs (and with a reasonable network topology). Reducing the number of duplicates is one of the ways.
So it's hard to say if D=6
makes the network more scalable or not.
we see both mean and median block arrival times well below the 4 second mark, hovering on average around 2.1-2.3 secs.
Looking at mean or median doesn't make any sense. We have to look at one of the worst cases. Like in [1], we use P95.
[1] https://ethresear.ch/t/block-arrivals-home-stakers-bumping-the-blob-count/21096
@yiannisbot - Thank you for making this issue. These have been on my to-do list but haven't got around to it yet.
@ppopth - The anonymity argument has been a long-standing one and has historically prevented us from potentially making improvements to the protocol. I am a strong advocate for security, however I think it is useful to understand the exact trade-offs. I don't know how much of a bandwidth/latency gain we could make if we sacrificed pseudo-anonymity. If its >95%, perhaps it's worth it?
I also don't have a really strong sense of how anonymous our nodes already are, and whether the argument to keep pseudo-anonymity when it could be already broken is worth not even exploring potential upsides in dropping it.
In the beginning (and I think still now, on some clients/versions) - Validators were required to subscribe to long-lived subnets. This meant that you could just search the DHT, grab any ENR that advertised long-lived subnets and you knew that this ENR corresponded to X validators (where X was the number of long-lived subnets). An ENR contains that nodes IP address. So very easily, you can tie an IP address directly to validators. It would then be a small matter of figuring out which validators that IP address had, but you could do that by connecting to them and watching what subnets they subscribed to and associated attestations.
Even worse, for years we had flood publishing. We only removed it because it was potentially causing a delay in block propagation. With flood publishing, you just get a node, subscribe to everything but have a mesh degree of 0, and then whoever sends you a message, you know instantly their IP, peer-id etc and you can find all the validators.
So, we have had non-anonymity on the network for years, which I know is not a good argument for doing it, but just wanted to add perspective in tradeoffs. We could go back to what we had, but potentially gain network efficacy.
In the current world, (at least in Lighthouse) all BNs subscribe to long-lived subnets, so you can't identify validators from ENRs. However I believe this is changing in PeerDAS with validator custody. I think you can search ENR (which contain IP addresses) and find nodes that have larger custodies. You then have IP adddresses of nodes that have validators. Again once you have that, you connect to those nodes, and watch their traffic to determine which validators they have.
In the early days, a researcher built a few sentry nodes on the network and watched gossipsub traffic and tied IP addresses to all validators, with some statistical uncertainty. It showed even with all the measures in place, it was possible to tie IP addresses to validator indexes, just through gossipsub traffic. I suspect the EF devops team has also done something similar (cc @parithosh @barnabasbusa )?
As a counter-measure (not saying its the most effective or without its own caveats) Lighthouse gives users the option to create two kinds of nodes. Normal BN nodes and separate proposer
nodes. The idea being that you run both, and the proposer node only proposes blocks and doesn't do any attestations so if your attestation node gets deanonymized you can avoid potential DOS vectors when proposing.
For example, if I solo-stake and later withdraw the money and transfer it to an EOA, I would lose the pseudonymity property of the EOA automatically, because you can link the EOA address to "who you are".
Not sure I follow this. Withdrawals can be sent from any node on the network. You can sign these things offline and broadcast them anywhere. But lets say you broadcast from your local node, it just sends the validators value to its withdrawal address. I guess if you match validators to IP's then yeah, you also match withdrawal addresses to IPs (assuming the node that is staking owns the ether).
If you are talking about users who are running ELs and BNs under the same IP, that assumption then ties in the EL network and how anonymous that is and tying transactions to validators.
Approach 1 - I'm not against this. We could add some levers in there that are time-based that help with diffusion. It does sound to have some pitfalls, but if we don't drastically cut-off the diffusion, it could help.
Approach 2 - I agree that this definitely reduces/eliminates pseudo-anonymity of validators. I am of the opinion that it might be worth exploring in simulations and getting a better understanding of our current anonymity to make an informed decision as to whether this might be something worth sacrificing
Approach 3 - We have tried simulating a few of these. Its hard to get something that looks like mainnet. In the end, we have made Lighthouse configurable via CLI that can change the degree. I think for a small period of time we did drop it to 6, but the default is currently 8.
Thanks for the input everyone! Very interesting thoughts and context!
Some further input and comments:
Approach 1
I'm not against this. We could add some levers in there that are time-based that help with diffusion.
I don’t have ideas here, so feel free to share approaches that are worth considering @AgeManning.
Approach 2
I think if you have some techniques in mind, it will be helpful to list them here.
Here’s a rough draft:
Let’s hypothetically assume that hop_count ~ 6
and we want to slow down diffusion by some portion (say to D/2
) for the latter half of the message’s path. We could obfuscate the sender by having the first, say, 3 hops add a random value to the hop_count
. So, the block producer, the first and second hops would all declare hop_count = rand[0-2]
and increase it by 1
before forwarding further.
The approach does not keep the block producer anonymised, but reduces the chances of the next hop knowing with more than 30% certainty that the previous node was actually the block producer.
The approach results in some messages incurring more duplicates and faster propagation, i.e., those that start with hop_count = 0
, than those that start with hop_count = 2
, which result in less duplicates and relatively slower propagation velocity.
There could be several things that could be done to streamline this according to what we’d want (e.g., setting the random hop_count
with a weight), but not sure the overall approach is adding as much obfuscation as would be acceptable.
Approach 3
If Approach 3 increases the block propagation time, it will contradict to the metric. So it means the approach is not good.
Well, the thing is that we don’t know IIUC? :) It could very well be the case that blocks propagate at the same speed (i.e., no increase in propagation latency), just with less duplicates. Looking at the p95 is best, I agree, but if we only want to be based on max arrival time: https://probelab.io/ethereum/block_arrival/2024-48/#message_arrivals_max_min_on_1536s_window_on_topic_mainnet_beacon_block-plot, we’re already in a very bad place.
A good approach here would be to set the default D = 6
for one large client (or a couple smaller) and observe the situation and then decide from there.
Well, the thing is that we don’t know IIUC? :) It could very well be the case that blocks propagate at the same speed (i.e., no increase in propagation latency), just with less duplicates.
Well, we do know, from maths: assuming unsaturated bandwidth (which is broadly the case in the happy case at least) and assuming the initial spread has almost no duplicates this can be used to derive "probable total spread latency" - it's the during the last hops that duplicates happen. If you send to fewer peers in the beginning, the number of peers that have received the message for each subsequent hop is exponentially lower, so it takes longer to reach all of them.
>>> [6**i for i in range(1, 6)]
[6, 36, 216, 1296, 7776]
>>> [8**i for i in range(1, 6)]
[8, 64, 512, 4096, 32768]
Assuming 5k nodes on the network, you can see how much of a difference this average makes. You can also reason about the probability of a duplicate - it is next to none for the first 3 hops and only on the 4th do we start seeing duplicates.
8 was chosen roughly based on these numbers - they weren't picked randomly at the time.
What the above numbers don't take into account is floodsub in the first hop - this is a problematic thing to reason about regardless, because it causes a lot of problems (privacy, bandwidth competition).
To extend on what @arnetheduck was saying, if we assume a system with uniform nodes, uniform latencies between nodes, and unsaturated bandwidth (i.e. we reason in time steps), we can try to model the diffusion process, the speed of diffusion, and what duplication happens when.
It is obviously a lot more complicated to model this in the real network, but there are still lots of similarities with the simple time step based model.
Time step based modelling
Worth noting that a node with degree D is first receiving from (at least) 1 node, and then sending to the remaining (at most) D-1. So the exponent is only D-1, except for the first step, where it is D or what the source floods to.
Now, let’s say that
As a first approximation, one can imagine diffusion as
However, this is overestimating
What one can think trough is where these nodes in
So a node in
- nodes in
$R_{i-1}$ : this is because the typical implementation sends to all its neighbours as soon as it gets the first copy. Even if it receives more copies in the same step, it will send the message to all except the first one. These are the cases where we would prefer to cancel a send, but it depends on the implementation whether this can be done. - nodes in
$R_i$ : these are the cases of “simultaneous cross-send”. For these, we have no chances of cancelling because of the information delay. - Finally, they will send to new nodes, but there will be overlaps.
More overlaps means a smaller
I’m far from having the exact stochastic formulation, and a more realistic continuous time model would be even more difficult. But it is clear that lots of nodes sending with a high D to a limited set of nodes results in lots of duplicates.
Edge count based modelling
Another way of looking at the diffusion, as a whole, is to realise that we are sending, overall, between
- lower bound: we have a symmetric D-regular graph with N nodes, which means we have
$N*D/2$ edges. A message will pass on each link at least once. There is no mechanism in the base protocol that would stop this from happening. - upper bound: worst-case, each message would travel on every link twice, in both directions. However, we never send back a message on the first link we received it from, hence the
$D-1$ .
So on average we have
But where are we in this range? That depends on the speed of diffusion, which implicitly depends on D again. The faster the diffusion, the less informed we are about the state of the peer node, hence the more cross-sends we will have. So overall, a larger D is bad from the perspective of duplicates twice. It is more sending, and it is more cross-sending.
With this long premise, let me focus on the approaches
The main difference I see is that Approach 1 and 2 tries to somehow estimate the diffusion state and act based on that. Approach 3 is instead just a simple parameter modification.
Approach 3
simply reducing D has advantages for reducing duplicates, but clearly at a cost of a delay increase. It is a parameter that has to be evaluated based on the expected number of nodes, and the latency allowed for the actual use case. I’m not convinced we are using the right D for all our topics, or that we should use the same for new topics. In fact for DAS we’ve used smaller values in simulations, but there we are not in the “unsaturated bandwidth” case, which makes a huge difference.
Approach 1
The issue I see with this time-based approach is that it amplifies the “state” of the network. If all is good and we are on the happy path, it keeps diffusion fast while reducing duplicates. But if we are not on the happy path, and diffusion is already slow for any reason, it slows it down even more. This makes me a bit sceptical. What we need is to estimate diffusion state, and time is a strange estimator for that. The problem, as I see it, is that we might not have a better estimator.
Approach 2
I’m a big fan of this approach as a generic GossipSub mechanism (that’s why I’ve also proposed it in the past). This of course does not mean that it is the right tool for some specific use cases where privacy must be preserved. So we might not use it on all topics, but e.g. for DAS it might make sense.
The Push-Pull phase transition
Independent of how we estimate the diffusion state, with approach 1 or 2, I would modify node behaviour differently from what you describe here. This is what I’ve called Push-Pull phase transition in https://ethresear.ch/t/fulldas-towards-massive-scalability-with-32mb-blocks-and-beyond/19529#duplicate-reduction-techniques-24.
Instead of simply changing D in later stages, I would reduce the number of links we push on, but I would keep the HAVE information flowing on the others.
- For large messages (several IP packets), this can be done with a simple IHAVE message.
- For small messages (one-two IP packets) having an individual IP packet just to say IHAVE does not make sense. But on topics where we have small messages, we usually have lots of messages. So we can e.g. keep sending messages with small hop counts, and piggyback IHAVEs for the rest.
I don't know how much of a bandwidth/latency gain we could make if we sacrificed pseudo-anonymity. If its >95%, perhaps it's worth it?
Maybe it doesn't matter here, but I would like to point out that the upper bound of the saving is 75%. Currently a node receives 4 copies and send another 4 copies on average. The best you can do is to receives a copy and send another copy (which is just 75% reduction).
I also don't have a really strong sense of how anonymous our nodes already are,
But, I think I have a strong sense of how much anonymity the CL currently provides (let me address all the mentioned issues below).
and whether the argument to keep pseudo-anonymity when it could be already broken is worth not even exploring potential upsides in dropping it.
I think claiming that it's already broken is kind of an over-claim (unless you take flood publishing into account). People keep telling me that anonymity is already broken without telling me how, so we don't have an evidence yet that it's already broken.
It would then be a small matter of figuring out which validators that IP address had, but you could do that by connecting to them and watching what subnets they subscribed to and associated attestations.
That sounds easy, but I don't see how it's easy or a small matter at all. Watching subnets they subscribe to doesn't tell which validators that IP address has. Each att subnet has about 400 validators at one slot and about 1,000 nodes, how is it a small matter to figuring that out?
Even if you watch the traffic metric like the message arrival time and you can reduce the number of candidates to like 10 (made-up number), that's still a kind of anonymity and it's not broken yet. (you have to guess more which one from those 10 nodes).
Even worse, for years we had flood publishing
I agree that flood publishing kills anonymity, but we don't want to do that anymore, so we are good.
Again once you have that, you connect to those nodes, and watch their traffic to determine which validators they have.
As I said earlier, watching the traffic doesn't tell you exactly which validator they have.
In the early days, a researcher built a few sentry nodes on the network and watched gossipsub traffic and tied IP addresses to all validators, with some statistical uncertainty. It showed even with all the measures in place, it was possible to tie IP addresses to validator indexes, just through gossipsub traffic. I suspect the EF devops team has also done something similar (cc @parithosh @barnabasbusa )?
That kind of research is very valuable. I would love to see how you do it and analyze the attack. I thin it's worth making it a formal ethresearch post.
In conclusion, I will accept it if we have a consensus that Ethereum is not supposed to provide (pseudo-)anonymity, but claiming that it's already broken is an over-claim.
- lower bound: we have a symmetric D-regular graph with N nodes, which means we have N * D / 2 edges. A message will pass on each link at least once. There is no mechanism in the base protocol that would stop this from happening.
- upper bound: worst-case, each message would travel on every link twice, in both directions. However, we never send back a message on the first link we received it from, hence the D − 1 .
I want to point out that from this study https://ethresear.ch/t/number-duplicate-messages-in-ethereums-gossipsub-network/19921, we are already very close to the lower bound. That is, we already have around D/2 duplicates.
So it's already unlikely that a message would travel on a link twice, which is kind of a corner case (at least for the current mainnet).
That sounds easy, but I don't see how it's easy or a small matter at all. Watching subnets they subscribe to doesn't tell which validators that IP address has. Each att subnet has about 400 validators at one slot and about 1,000 nodes, how is it a small matter to figuring that out?
This has been done before, in the early days, maybe its different now.
For simplicity, lets say we target a specific peer, and therefore a specific IP (although it might be more beneficial to have a node with D = 5000 and try and get on everyone's mesh). Depending on the client, you can know instantly which long-lived subnets it will be subscribed to so you avoid those subnets (but you remain subscribed to all subnets). When a validator produces an attestation, it will send that attestation to you (because you are subscribed to those subnets) with some statistical uncertainty. You wait for that and that's it you're done. It will send you the attestation which indicates it's validator id. You know its that validator and not someone else, because it is not subscribed to the subnet, so it is not forwarding any attestation. Validators don't subscribe to subnets unless they are aggregators.
Even if that wasn't the case, you could make sure you are on their mesh, and watch the unique messages on subnets that come from them first. Over-time you can identify their validators, because they will be the only ones sending you their attestations (it will be unlikely to get an attestation from another person than from them, although not impossible). Correlating this timing over a few epochs should allow you to narrow it down to exact validators with decent accuracy, i'd imagine.
If you are still not convinced, we can do this as a side project, to prove the idea.
@cskiraly thanks for the insightful input. A couple of clarifications:
Approach 1
How do you define "happy path"? And what would be reasons for "diffusion is already slow for any reason"? E.g., does it include nodes being CPU-overloaded and dropping network-related events?
The Push-Pull phase transition
What does this involve exactly? I'm not sure I follow. Reducing D
, leaving the frequency of IHAVE
as is and adjusting what we send IHAVE
s for (i.e., for larger messages only)?
Validators don't subscribe to subnets unless they are aggregators.
Yeah, I wasn't aware that validators don't subscribe to subnets. That kind of makes sense because finding and connecting to mesh peers are not cheap and they have to do every epoch.
If nodes use fan-out peers to publish, it kills anonymity and I was aware of that.
because they will be the only ones sending you their attestations (it will be unlikely to get an attestation from another person than from them, although not impossible).
I'm not convinced by this. In fact, it's very likely to get from another person. Let me explain why.
Let's assume you are 2 hops away from the publisher (publisher <--> middle man <--> you). Because there are only two hops, the latency will be low, so this path is a fast path. the question is how am I sure that there is no faster path?
Assume that there are 10k nodes in the network. You can see that there are only 8^1, 8^2, 8^3 nodes in the first hop, second hop, and the third hop respectively.
- So there is only 8^2/10k = 0.6% chance that there will be another 2-hop path from the publisher to you.
- In fact, you also have only 8^3/10k = 5% chance that there will be another 3-hop path from the publisher to you.
So I can say with confidence that all other paths from the publisher to you are at least 4 hops long. So the 2-hop path through that middle man is the only shortest path. It also means you will always receive the message from the middle man rather than the original publisher.
Do you see now that it's very likely that you will always receive the message from someone that's not the publisher?
with decent accuracy
Now, I will show you that it's not accurate at all.
As I said, there are only 8^2 nodes in the second hop, but there are only 8^1 in the first hop (direct mesh peer to the publisher).
Assuming that all the second hop peers always receive the message from one of the peers (their middle mans), it can happen with high probability as I mentioned above.
Now you do the attack and always receive the message from one peer. But you can see that there are only 8 mesh peers to the publisher, so you will have the chance at most 8/8^2 = 8/64 = 12.5% that you are directly connected to the publisher.
So 12.5% is not just decent accuracy, but it's too way low.
You can see from my numbers that gossipsub provides pretty good anonymity, but I agree that it's only for subscribing nodes.
By the way, I'm sorry for the late reply.
@cskiraly thanks for the insightful input. A couple of clarifications:
Approach 1
How do you define "happy path"? And what would be reasons for "diffusion is already slow for any reason"? E.g., does it include nodes being CPU-overloaded and dropping network-related events?
Happy path in this case is just diffusion as you would expect based on only bandwidth and baseline network latency constraints.
The reasons for slow diffusion can be many, from temporary network outages to CPU overload, or simply late release of the message itself because of e.g. a disk I/O hiccup or waiting on something which supposed to be immedate.
The Push-Pull phase transition
What does this involve exactly? I'm not sure I follow. Reducing
D
, leaving the frequency ofIHAVE
as is and adjusting what we sendIHAVE
s for (i.e., for larger messages only)?
Swithing to "pull mode" would mean sending only HAVE information on a mesh link, and letting the other side WANT the message. So this is changing how you send the message to your mesh peer. You do not "push" it, just let it know you have it.
Now of course there are many subtle variants we can play with, e.g.:
- you can do this on all your mesh links, or only on parts of them, and the ratio could depend on the diffusion estimate
- depending on message size and frequency, you might want to send HAVE messages in groups. Large message: you send an IHAVE. Frequent small messages: it does not make sense to send indiviudal HAVE messages, but send it batched, e.g. every 100ms or 20 messages, which comes first.
@ppopth - All good for the late reply. Thanks for clarifying.
I think the arguments you are making are for the block topic, where mesh's are relatively stable and that the de-anonymizing node has a mesh degree of 8.
If i wanted to de-anonymize validators, I'd create a node on the subnets, set my mesh degree to 20,000, and continually send grafts to every node until I was on every node's mesh. I therefore have a direct connection to everyone on the subnets (which often get formed and unformed as nodes subscribe/unsubscribe when they are aggregators).
I've had lighthouse been eclipsed in a similar manner before, so I know this kind of thing is possible.
It would take a little bit of engineering to do this and keep track of which validators are which, but again, I think its more possible than you might think.
But also because we publish to our fan-out and not-subscribe, that also is mostly game-over for anonymization anyway.
I'd create a node on the subnets, set my mesh degree to 20,000
It's also good to note that that requires like 2,000 times more bandwidth than normal full nodes. If you have that high bandwidth, you could do more extraordinary things like splitting the network which results to the liveness issue of the chain. Even though it's achievable to get a 50Gbps machine, it makes more sense to exclude it from the threat model, because otherwise we are doomed anyway.
But that doesn't matter anyway because as you said publishing to fan-out peers kills anonymity and nothing else matters.
Maybe it doesn't matter here, but I would like to point out that the upper bound of the saving is 75%. Currently a node receives 4 copies and send another 4 copies on average. The best you can do is to receives a copy and send another copy (which is just 75% reduction).
I would like to correct this. We don't receive 4 copies and send 4 copies. We expect to receive 8 copies (which can be lower than that if some mesh peers are dead).
I also put the correction in the original research result here.
https://ethresear.ch/t/number-duplicate-messages-in-ethereums-gossipsub-network/19921/4?u=pop