Use QUIC for all communications between peers
Opened this issue · 12 comments
QUIC is a network protocol defined by Google, implemented in Chrome, used by various Google's services like Youtube or Maps. Its scope is TCP+TLS, but it's implemented on top of UDP. Standardization is in progress at the IETF:
- quic itself: https://tools.ietf.org/html/draft-ietf-quic-transport-16
- http-over-quic, now named http/3: https://tools.ietf.org/html/draft-ietf-quic-http-16
Here is what could be interesting for us:
- it's supposed to be more efficient than TCP.
- generic protocol with encrypted communications, and will be used for https connections. Ethereum nodes' communications will be more complicated to identify/block for an external actor
- in some circumstances, low cost for establishing a new communication (0 RTT)
This last point is very interesting, because it allows to connect to a lot of peers. That's especially useful for attesters or block producers: they need to push their signatures/blocks, and contacting more nodes lowers the impact of a sybil attack at the p2p level (#6). It's also interesting if we want to go the Tor route (github issue to be created). There is no magic for the 0 RTT trick however: it works by caching the communications keys.
As of today, it's a work in progress: even if it's used at Google for a while the standardization is not finished (see this for a high level picture of the impact: https://blog.cloudflare.com/the-road-to-quic/) It's under implementation for the libp2p team. Other implementations are listed here: https://github.com/quicwg/base-drafts/wiki/Implementations. Anyway there is no need to rush, but we can track the progress in this issue. On our side (Consensys/PegaSys) we will give it a first try in December.
Have the simulations for using QUIC in sharding been completed? If so, are there any results to share?
When we tried in December (with the libp2p) we had packaging issues so we decided to pause it. We're going to try again soon (within ~4 weeks) on Handel.
in some circumstances, low cost for establishing a new communication (0 RTT)
This last point is very interesting, because it allows to connect to a lot of peers
It would be very interesting to verify how efficient this is for real. Setting up a QUIC connection isn't free. What you can do with zero-roundtrip connects is to send encrypted/authenticated data in the first packet. Setting up an interactive connection will probably still require roundtrips.
Setting up a QUIC connection isn't free
From my understanding, setting up a QUIC connection requires 1 packet whereas with TCP, requires a 3-way handshake. It's much easier to send 1 packet to multiple peers instead of doing a 3-way handshake with multiple peers.
We evaluated QUIC-go protocol as a transport layer for the handel framework:
https://github.com/ConsenSys/handel/
We observed 3x slowdown compared to UDP based network (experiments on 500 one-core AWS nodes).
The most important factors we identified are:
- 0-RTT handshake not supported in QUIC-go yet (with UDP we don't have handshake)
- QUIC is using encryption by default (our UDP communication is not encrypted) and handel is CPU intensive (BLS signature verification) so the whole protocol slows down due to CPU overload.
@marten-seemann and @bkolad have been chatting offline about the QUIC experiment. A slowdown of 3x is unexpected and Marten has provided some guidance about elements to adjust, such as congestion control sizing, preestablishing connections, the AcceptCookie
callback (which by default adds 1-RTT) and others.
@bkolad were you able to iterate on those? Is there a stress test in https://github.com/ConsenSys/handel/ that we could use to replicate your setup and test scenario?
I quickly reviewed the QUIC network implementation. Unless I'm mistaken, it seems to be thrashing sessions (opening a QUIC session, reading one packet, then closing the QUIC session).
Renegotiating QUIC sessions on every packet is likely a big cause of slowdown. With this behaviour, the UDP and QUIC versions aren't really comparable.
Could you please keep QUIC sessions open and run the benchmark again?
I filed an issue with details: Consensys/handel#126.
@raulk @marten-seemann
Please see more details here:
Consensys/handel#4
The initial slowdown I reported was 4x, after implementing the AcceptCookie
callback it went down to 3x at this point I was happy with the result as I think the handshake and encryption overhead are unavoidable (like I pointed out handel spends most of the CPU time on bls signature verification and the QUIC encryption adds on top of it). I run the stress tests on our custom test bed of 500 AWS nodes.
I agree the scenario is not directly comparable to the UDP case and handel fits better the UDP model. Our intention was not to compare QUIC to UDP but rather switch to QUIC and check what happens for handel protocol (hoping that 0-RTT handshake would do a miracle).
Thanks for filling the issue, I will give more detailed answer regarding session management there.
For ETH2.0 context I think we should continue the investigation of using QUIC for communication between peers as proposed by @nkeywal
Thanks for the info, @bkolad!
Our intention was not to compare QUIC to UDP but rather switch to QUIC and check what happens for handel protocol
IIUC, the UDP reification of the network in Handel doesn't set up a secure channel.
If encryption and authentication, parallel conversations (multiplexing), reliability or congestion control are non-requirements, then QUIC is a poor functional fit for this use case.
A more accurate comparison would be UDP + (overlaid multiplexing + encryption + congestion control) vs. QUIC.
In practice, Handel would not run in isolation but on the Serenity network where these aspects are relevant.
(hoping that 0-RTT handshake would do a miracle)
Could you elaborate on this? In terms of what? Your UDP variant is not handshaking from what I gather.
Could you elaborate on this? In terms of what? Your UDP variant is not handshaking from what I gather.
I am not being clear, for reasons you pointed out any stateful protocol would perform worse in terms of latency (TCP/TLS, QUIC etc) compared to the UDP. We are thrashing sessions for every packet and we pay the cost of handshake every time. In my intuition the latency should be:
QUIC > QUIC-0-RTT (when peer contact a node it saw before we wouldn't pay for the RTT) > UDP
and we thought it would be interesting to see how much 0-RTT helps here(by miracle I meant the latency would be close to UDP).
In practice, Handel would not run in isolation but on the Serenity network where these aspects are relevant.
Yes that's why I think it is interesting exercise to try out QUIC.
Yes that's why I think it is interesting exercise to try out QUIC.
Yeah, and thanks for spearheading this effort in the Serenity community! I wanted to make sure we drew accurate conclusions out of your experiment, which we seem to agree on now. Cheers!