Move IPFS/libp2p specific components to ipfs/boxo and/or libp2p/go-libp2p-kad-dht
guillaumemichel opened this issue · 7 comments
All modules that are specific to the IPFS DHT (e.g that cannot/shouldn't be used in other DHT networks/implementations) should move to ipfs/boxo.
These modules include:
- IPFS Server mechanism, that is only handling IPFS requests (
basicserver
can remain in this repo for testing purposes). - IPFSv1 message module, including the IPFS protobuf message format and helpers.
The IpfsDHT
struct
should be defined directly in ipfs/boxo. This includes instantiating a new Libp2pEndpoint
, building a RoutingTable
, defining the server's behavior and the message format, interacting with the query mechanism (to decide when each of the queries should terminate). IPFS constants (e.g bucket size, number of closer peers to return, IPFS DHT protocol ID, etc.) should be defined directly in ipfs/boxo.
Note that consumers of the current go-libp2p-kad-dht repository, will become consumers of ipfs/boxo/kad-dht, and NOT consumers of go-kademlia directly.
What should stay in go-kademlia:
- Provider Store implementation, it should be made generic so that other implementations can use a data store.
- Different routing tables (e.g
FullRT
,ClientRT
,LazyRT
etc.) because even though they are built to serve in the IPFS DHT, they are generic components that could be used in other Kademlia implementations. - Libp2p Endpoint can be used by other Kademlia implementations
The goal of this separation is to get the ground ready for the Composable DHT.
Thanks for this, I was planning to ask you to expand on your thinking here, so having this issue is really useful.
I think it would be good to work some of this into the design documentation. Currently the design has an IPFS DHT section that you could update to be more explicit about the boundaries between this repo's goals and IPFS-specific goals.
I note that peer routing is currently in that IPFS DHT section but do you agree that its a feature that is generally useful across all kad deployments?
All modules that are specific to the IPFS DHT (e.g that cannot/shouldn't be used in other DHT networks/implementations) should move to ipfs/boxo.
These modules include:
- IPFS Server mechanism, that is only handling IPFS requests (basicserver can remain in this repo for testing purposes).
- IPFSv1 message module, including the IPFS protobuf message format and helpers.
Note that consumers of the current go-libp2p-kad-dht repository, will become consumers of ipfs/boxo/kad-dht, and NOT consumers of go-kademlia directly.
As has been mentioned previously both of these are related to the libp2p DHT spec (https://github.com/libp2p/specs/tree/c733210b3a6c042d01f6b39f23c0c9a3a20d3e88/kad-dht) not to the IPFS Public DHT specifically.
For some of the things specific to the IPFS Public DHT that should probably live in boxo look at libp2p/go-libp2p-kad-dht#597 and linked issues. It includes things like the protocol name(s), put/get validators, the network constants like k
, the record expiration times, routing table refresh intervals, etc.
@guillaumemichel IIUC moving all the components to boxo is also inconsistent with your comment in libp2p/go-libp2p-kad-dht#846 (comment), where a libp2p DHT user (who is not using the IPFS Public DHT) reasonably wants to keep using their DHT without bringing in IPFS dependencies.
If you don't want any libp2p components in this repo, then this likely means creating a barebones libp2p dht using this implementation as an alternative client/server implementation in go-libp2p-kad-dht. However, note that this means that it is likely that many PRs to modify DHT behavior will end up as multiple PRs with the associated overhead of bubbling as has been flagged previously as the cost of having a separate repo here.
The confusion between the IPFS DHT and the libp2p is expected to be addressed by the Composable DHT. Until then, we need to be very careful with naming and dependencies generally.
- libp2p DHT implementation: a Kademlia implementation defining a message format, a server behavior and offering the following RPCs:
FIND_PEER
,PUT_PROVIDER
,GET_PROVIDERS
,PUT_VALUE
,GET_VALUE
. - IPFS DHT Implementation: an instantiation of the libp2p DHT implementation, using custom parameters (such as bucket size, protocol identifier, routing table refresh interval, etc.)
- It is possible to instantiate a new libp2p DHT network by using a dedicated protocol ID and a set of bootstrap nodes. (like Celestia and others are doing)
- IPFS DHT network: the swarm of peers running the IPFS DHT implementation (using the IPFS DHT protocol ID). AFAIU libp2p applications making use of a DHT, but not having a dedicated DHT network use the IPFS DHT network. As long as this is true, default libp2p DHT network == IPFS DHT network. So libp2p peer routing depends on the IPFS network, and also the IPFS DHT implementation (boxo), because we don't want to have different bucket sizes or refresh intervals in the same network for now. It would be possible to change this by having distinct bootstrap peers (not connected to nodes in the IPFS DHT network) for the libp2p DHT network, but it may be insecure if this DHT network isn't well populated.
@aschmahmann I agree with everything you wrote. go-kademlia is a generic Kademlia implementation (genericity is required, not to build BitTorrent implementations, but to build new features such as the Composable DHT, the Double Hash DHT, and generally facilitate the improvement process of the IPFS DHT). For this reason, and as it doesn't depend on libp2p other than being a possible transport, go-kademlia should not be the libp2p DHT implementation. However the libp2p DHT implementation (e.g go-libp2p-kad-dht) should depend on go-kademlia. And finally the IPFS DHT implementation (e.g boxo) should depend on the libp2p DHT implementation.
The libp2p DHT implementation should define the IpfsDHT
(or Libp2pDHT
?) struct, the server behavior (request handling), and message format. The IPFS DHT implementation should only define parameters of the libp2p DHT implementation, such as protocol ID, bucket size, refresh interval etc. So the IPFS DHT implementation would be an instantiation of the libp2p DHT implementation, itself depending on go-kademlia for the Kademlia routing logic.
You are right that we should pay attention to libp2p/go-libp2p-kad-dht#846, but I doubt we will be able to tackle the weird dependency chain (libp2p DHT network -> IPFS DHT network -> IPFS DHT implementation -> libp2p DHT implementation) before the Composable DHT.
However, note that this means that it is likely that many PRs to modify DHT behavior will end up as multiple PRs with the associated overhead of bubbling as has been flagged previously as the cost of having a separate repo here.
Yes, it is indeed not ideal. go-kademlia's goal is to solve the Kademlia routing, and to expose a simple interface to its consumers. This interface is simple and generic, allowing the caller to control some parts of the behavior, or directly implementing its modules implementing the defined interfaces in the same repo. The Kademlia routing interface is not expected to change in the future, so once the repo is functional, its interfaces are not expected to change. The next potential change would come with the Composable DHT (if go-kademlia is transformed to be the new Composable DHT implementation). Alternatively, the Composable DHT could be another repository depending on go-kademlia.
Alternatively, if we don't want to have 3 DHT repos (go-kademlia, go-libp2p-kad-dht and boxo/kad-dht), we could merge the libp2p DHT implementation with go-kademlia. One module of go-kademlia could the the libp2p DHT implementation. We could add more implementations, for instance a simulation implementation that we are using to test the protocol, and example implementations showing how to make use of the go-kademlia repo. So the libp2p DHT implementation would be an example of how to use go-kademlia.
Proposal
Split functionality over 3 repos:
-
The focus of go-kademlia remains a generic Kademlia toolkit that can be configured for use by different networks. It provides a more maintainable, better performing and extensible foundation for new ideas like the composable DHT.
-
Keep go-libp2p-kad-dht as the home of the libp2p dht, but refactor it to be built in terms of
go-kademlia
. The result of this refactoring becomes version 2 (go-libp2p-kad-dht/v2
) -
Create a package in boxo (for example:
routing/dht
) that contains the configuration of the libp2p dht for IPFS.
Outcome:
- IPFS applications interacting with the IPFS DHT network (e.g. Kubo) can use
boxo
- Projects using libp2p for non-IPFS dhts (e.g. Celestia and others) can use
go-libp2p-kad-dht/v2
directly with new parameters
Tasks
- create a
v2-develop
branch ingo-libp2p-kad-dht
to track the refactor, eventually to becomev2
- rename
go-libp2p-kad-dht/IpfsDHT
toKadDHT
inv2-develop
- make
KadDHT
configurable with protocol name, bootstrappers, validators, network constants, expiration times, routing table refresh intervals etc. - message formats and server behaviour are implemented in
go-libp2p-kad-dht/v2
- refactor
go-kademlia
to remove dependency ongo-libp2p
. Functional dependencies such asLibp2pEndpoint
move togo-libp2p-kad-dht/v2
whereas constant/type dependencies likeConnectedness
are replaced with local equivalents. go-kademlia
focuses on kademlia algorithm implementation, searches, event queue management, peer routing- Create a
DHT
type inboxo
that instantiatesKadDHT
with IPFS specific options. This is low configuration with sensible defaults. Applications requiring more control can useKadDHT
directly.
@iand it makes a lot of sense to me!
A few minor remarks:
- We may want to keep the
KadDHT
module, andLibp2pEndpoint
, server, message format, etc. ingo-kademlia
(e.g in theexample
folder while we are actively working on them, as the interfaces may slightly change during the development. If we already split the code, we would have to do one PR in each repo when updating an interface. Once we are happy with theKadDHT
implementation, we can move it and its associated modules togo-libp2p-kad-dht/v2
. - IMO having examples for modules in
go-kademlia
can be useful, especially if they are generic enough. For instanceLibp2pEndpoint
seems generic enough and could be used in Kademlia implementations other thanKadDHT
as a message endpoint. And generally, I think it is good to have examples for how to implement an interface in the same repo. KadDHT
could also be namedLibp2pDHT
Agree that prototyping in the example folder can make sense (although practically speaking go.work files make cross-module development trivial)
Libp2pDHT
seems redundant to me since it's in a libp2p repository/module. libp2p/go-libp2p-kad-dht#337 suggests naming it Kad
. I think KadDHT
makes it clear that it's a Kademlia DHT rather than something like Chord 😄
Closing as resolved