libp2p/go-libp2p

Project Flare(decentralised Hole Punching) Phase1 Meta Issue

aarshkshah1992 opened this issue · 11 comments

This issue is to aggregate all the work and PRs we have open for taking Hole Punching to completion.

Done criteria: we recommend key consumers like go-ipfs and lotus enable it by default because the functionality works, is well tested, and has quality assertions in place to prevent regressions.


Solve Simultaneous connect in multi-stream for TCP hole Punching (Ready for Review)

go-multistream suffers from the simultaneous connect problem wherein it fails when two peers try to connect each other at the same time. This blocks hole punching. This is ONLY a problem for TCP and NOT for QUIC because QUIC does NOT use the multistream protocol for negotiating the security and stream negotiation protocols. The PRs below solve this problem using the method specd out in libp2p/specs#196.


Swarm & Transport changes for Hole-Punching


Emit event for NAT device type


Limited Relay Protocol


Hole-Punching Co-ordination via Relay Server


Integrate Limited Relays


QUIC Changes


AutoRelay


Documentation


Automated testing to prevent regressions

  • libp2p/test-plans#21
    • 2022-11-07 note: this item is deemed out of scope since it's, unfortunately a casualty of a time when functionality released without ensuring we have accompanying end-to-end test coverage. Backfiling test coverage for important functionality (including hole punching) is on the libp2p/test-plans roadmap and also dependent on some work in Testground itself. The issue above will get tackled above as part of those efforts.

Other

The below is a documentation issue that we will do as a best effort.

  • #1017 (document libp2p UPnP and manual port forwarding).

Hi everyone,

I would love to integrate this decentralised Hole Punching mechanism in pcp. Just by reading this issue I don't really see the big picture of how the mechanism is working exactly, which new APIs are introduced and generally how to leverage the new capabilities. Are there more resources available? If yes, could you point me in this direction?

vyzo commented

Hi @dennis-tra

Currently we have too many balls in the air, so it might be a little cumbersome to integrate.
There is a testing/flare branch that should have most of the pieces together.

For the big picture:

  • You need one or more limited relay servers. In the long-run this will be running in every public DHT node, but for now you need to run one explicitly. There is a standalone daemon implementation in https://github.com/vyzo/libp2p-relay
  • You need to enable v2 relay in go-libp2p, this is handled in testing/flare
  • If you use QUIC (strongly recommended), you need to integrate libp2p/go-libp2p-quic-transport#194
  • You may need to get your nodes to speak autorelay with v2 relays; there is preliminary work in #1058 but it's currently incomplete and broken.

We are currently undergoing alpha testing within pl, using this set of programs, which is also a good start in figuring out the integration: https://github.com/vyzo/libp2p-flare-test

Btw, if you want to participate in the flare alpha test we'll be happy to have you -- I can send you the config with the necessary tokens.

Currently we have too many balls in the air, so it might be a little cumbersome to integrate.
There is a testing/flare branch that should have most of the pieces together.

Alright, I totally understand!

For the big picture:

  • You need one or more limited relay servers. In the long-run this will be running in every public DHT node, but for now you need to run one explicitly. There is a standalone daemon implementation in https://github.com/vyzo/libp2p-relay
  • You need to enable v2 relay in go-libp2p, this is handled in testing/flare
  • If you use QUIC (strongly recommended), you need to integrate libp2p/go-libp2p-quic-transport#194
  • You may need to get your nodes to speak autorelay with v2 relays; there is preliminary work in #1058 but it's currently incomplete and broken.
    We are currently undergoing alpha testing within pl, using this set of programs, which is also a good start in figuring out the integration: https://github.com/vyzo/libp2p-flare-test

Thanks for the explanation 👍

Btw, if you want to participate in the flare alpha test we'll be happy to have you -- I can send you the config with the necessary tokens.

That would be great, I'd be happy to test :) I'll shoot you a mail at libp2p-at-libp2p.io ?

vyzo commented

yeah, email at libp2p-at-libp2p.io would be fine.

Circuit v2

  • libp2p/go-libp2p-circuit#125 implements the circuit v2 spec
    • It might need some work to reduce the attack surface.
  • Remove v1 circuit relay on the server side. The client has a built-in fallback to v1, and PL will continue to run v1 relays for a while.
  • Build logic (using autonat events) to start a v2 relay if the node is publicly accessible.

Tentative timeline: Get this ready for the go-ipfs v0.10 release.

Hole Punching

  • QUIC support: libp2p/go-libp2p-quic-transport#194
  • libp2p support: #1057
    • reduce the number of retries to 3. Data shows that most connection attempts succeed after one or two attempts, and are unlikely to succeed with more attempts

Tentative timeline: We should defer enabling this until a critical mass of nodes have upgraded and are running v2 relays.

AutoRelay

Once a critical mass of v2 relays has been deployed, rewrite the code to use v2 relays (reserve slots and refresh reservations).
The selection logic should be two-fold:

  • Use the DHT and find the closest peers in KAD space, reserve to 2 (or 3) relays from those candidates. Detection of v2 relay support is easy, we can simply wait for identify and see if the node supports the v2 hop protocol.
  • Provide an option to configure static v2 relays (without a default list! we want to stop running relays and be a point a centralization) for users who run their own relays.

Closing this issue. Relay v2 and circuit v2 were rolled out with v0.16.0.

@marten-seemann: I'm viewing the done criteria as meaning "we recommend key consumers like go-ipfs and lotus enable it by default".

I think we should review what that specifically means, but I know in the immediacy we need #1351 .

For being crisp here, can you please add anything to the checklists above that still needs to be done and remove the items that are no longer relevant (e.g., I saw you flag #1017 in #1122 as not relevant)?

To make sure I don't cause any extra churn, we can then have a quick sync to make sure sure we're on the same page. Sound good? Thanks for staying on this trail to its close.

@marten-seemann : I updated the issue description to include the issues that I believe are open and need to be closed before we call this done:

  1. #1351
  2. libp2p/docs#110
  3. libp2p/test-plans#21

I attempted to be clear in each issue about what's needed. Feel free to edit/expand.

#1017 was on the existing todo list. Do we need that or can it be removed?

@marten-seemann @p-shahi : I want to get this closed out. I know we have already agreed to defer the regression test. What is the priority of #1017 ? Should we complete that before marking this as done?

Looking at the issue, it seems the remaining work left is to document and add an example for UPnP.
If users are running into issues with hole punching and have requested docs/examples like this, then I say we treat this as high priority.
I'm not sure if that's the case however, so I suggest we close this issue and track 1017 as best effort

Moved #1017 to the Best Effort Track and marking this as complete