Consolidate IPFS Repositories
guseggert opened this issue · 32 comments
Description
Problem
The go-ipfs
dependency closure includes 47 modules under github.com/ipfs
. Here are their interdependencies (this does not include libp2p nor other PL orgs):
Pain
- Changes must be propagated across many repos, in the right order
- Repos are best-effort to maintain and keep up-to-date, leading to complex dependency graphs due to different versions floating around
- It's difficult to get feedback about whether a change is safe for consumers of the code, due to being in different repos w/ different CI
- In some cases, this discourages experimentation since it can be hard to bubble changes up to end-user applications like go-ipfs
Current Desirable Properties
- Experimental code can easily mix-and-match functionality from go-ipfs
- The dependency graph of the consumer does not include every transitive dependency of
go-ipfs
Why are repos structured this way?
The intention of the current layout was to encourage flexibility, extensibility, and experimentation. Functionality of IPFS could be reused in other projects without depending on IPFS as a whole.
Also these repos predate most Go tooling.
How much does a repo cost?
Repo maintenance costs include:
- Keeping dependencies up-to-date
- This is non-trivial as it often requires chasing down other dependencies in the dependency graph...mostly we don't do this until we have to
- Releasing new versions as necessary
- Making sure CI is still working
- Migrating from Travis/CircleCI to Actions (still in progress)
- Rolling out unified CI
- Backporting changes across major versions as necessary
- Manually testing impact of new code changes on downstream consumers
- Monitoring issue trackers, PRs, etc.
- Updating submodules
- Commonly used for testing, example code, etc.
- Often these contain circular module dependencies which complicate propagating breaking changes
Why now? What's changed?
We have an increasing amount of:
- Repos
- See maintenance costs above
- Some are in various states of deprecation, which adds to the maintenance costs and the cost of implementing new features
- Some don't build due to flaky tests, with not enough incentive to fix them until it becomes a blocker
- Projects
- Often these result in backwards-incompatible changes, sometimes even new major versions, which then need to be propagated around to all the downstream repos
- finding those repos can be difficult (e.g. backporting across versions, in-flight work, etc.)
- Increase in # in-flight projects means we're more likely to have repos in transient broken states which block/slow the progress of other projects (this happens often)
- Often these result in backwards-incompatible changes, sometimes even new major versions, which then need to be propagated around to all the downstream repos
Also, Go modules now exist, along with module graph pruning. The latter is key to preventing consumers from having an explosion of transient dependencies if they just want to reuse some small piece of code.
How can we consolidate repos? What's the ideal end state?
We want our repo layout to facilitate day-to-day development, while also letting us reuse components and functionality. Code that is commonly changed and built together should be in the same repo (as much as possible), so that it can be tested and released together.
We can leverage some of the new tooling around Go modules to retain the flexibility of separate repos, without having to pay the significant cost.
The ideal repo layout:
- go-libipfs
- Roll up most repos that start with
github.com/ipfs/go-*
- Build produces no binaries
- Contains no Go submodules
- Includes all supported "official" interfaces and implementations
- Unsupported and experimental code can live elsewhere, once they "graduate" they are moved into the
go-libipfs
repo for long-term maintenance
- Unsupported and experimental code can live elsewhere, once they "graduate" they are moved into the
- High code quality bar
- Careful consideration of cross-package dependencies
- Consumes other libs like IPLD, multiformats modules, libp2p, etc.
- Roll up most repos that start with
- go-libdatastore
- Datastore interfaces and supported implementations
- This is its own repo to avoid circular dependencies with libp2p
- TODO can libp2p be refactored to remove the circular dependency? Also, Go tolerates circular module dependencies, so why specifically is that bad?
- (list of reasons added by mvdan)
- Impossible to require one module without the other, in either direction.
- Updating both modules becomes a trickier dance: modify A, modify B, update A's dependency on B, update B's dependency on A
- The module dependency graph becomes a "downward spiral" bouncing between A and B, meaning your dependency graph will grow over time
- (list of reasons added by mvdan)
- go-ipfs
- Thin layer that consumes
go-libipfs
and produces theipfs
binary - Could be some other name for the Go IPFS implementation
- Thin layer that consumes
- go-ipfs-gateway
- Experimental gateway implementation that also consumes
go-libipfs
- Experimental gateway implementation that also consumes
Other consumers of go-libipfs
include libp2p (datastore) and Filecoin and IPFS cluster and ipfs-lite, and the IPFS examples.
What about consumers of repos we want to remove/archive? How do we roll this out?
go-libp2p did something similar a couple years ago, largely avoiding breaking consumers by shimming out existing repos to point to the consolidated one, example: https://github.com/libp2p/go-libp2p-protocol/blob/master/protocol.go
We can use this same trick to incrementally consolidate without breaking consumers.
See e.g. this PoC of moving go-namesys into go-ipfs while preserving backwards compatibility (in reality we'd move it to go-libipfs):
There may be some cases where this isn't possible without breaking changes.
2021-11-05 notes
- Caution on ipfslib being a kitchen sink.
- Goal is: development velocity? or code quality barriers?
- development velocity!
- Critical things for actioning
- Possible prep work: get everything in the graph to point at one version of each module. (May reduce risk of problems being discovered mid-consolidation.)
- List of modules
- Can we get this as far as a csv (or whatever) file with a list of
{oldimportpath}, {newimportpath}, {optionalReasonNotToMove}
, and review that up-front?
- Can we get this as far as a csv (or whatever) file with a list of
- Incrementalism:
- Do not want to do interface changes during consolidation. Move should be mechanical renaming of import paths, only.
- Avoiding interdependency within the monorepo - having the tooling to detect
- Handling CI
- Tenets for merging in to lib
- Anything merging in is using latest version
- No flaki test - if there is, disable and file issue in lib-ipfs
- Action items
- Identify package order: leaves up
- Scripting for validating "merge-in tenets"
- Scripting for determining interdependency
- Figuring out a way to not lose commit history
- Create runbook for moving a repo into lib-ipfs
- Handling old PRs - likely close them (but notify them
- Handling old issues
- Example:
- If less than 20 look at them
- If more, evaluate?
- Example:
- Run script to copy over commit history
- Archive the repo
- Callout: some should move into go-ipfs (example: go-ipfs-config)
Scripting for determining interdependency
Assuming this is about avoiding coupling within the monorepo, I'm happy to help with this bit; it should be a fairly straightforward Go test, e.g. in the root package.
Some useful data:
A topological ordering of github.com/ipfs/*
dependencies of go-ipfs
, from leaves to root:
$ go mod graph | tsort 2>/dev/null | cut -d' ' -f 1 | grep 'github.com/ipfs' | cut -d'@' -f 1 | awk '!x[$0]++' | tac
github.com/ipfs/go-detect-race
github.com/ipfs/go-metrics-interface
github.com/ipfs/go-ipfs-delay
github.com/ipfs/go-ipfs-blocksutil
github.com/ipfs/go-ipfs-exchange-interface
github.com/ipfs/go-ipfs-routing
github.com/ipfs/go-cid
github.com/ipfs/go-ipfs-exchange-offline
github.com/ipfs/go-verifcid
github.com/ipfs/bbloom
github.com/ipfs/go-ipfs-ds-help
github.com/ipfs/go-ipfs-pq
github.com/ipfs/go-ipfs-util
github.com/ipfs/go-ipfs-posinfo
github.com/ipfs/go-ds-leveldb
github.com/ipfs/go-block-format
github.com/ipfs/go-ipfs-blockstore
github.com/ipfs/go-log
github.com/ipfs/go-ipld-format
github.com/ipfs/go-ipld-cbor
github.com/ipfs/go-ipld-legacy
github.com/ipfs/go-fetcher
github.com/ipfs/go-peertaskqueue
github.com/ipfs/go-log/v2
github.com/ipfs/go-ds-badger
github.com/ipfs/go-ipfs-chunker
github.com/ipfs/go-cidutil
github.com/ipfs/go-blockservice
github.com/ipfs/go-merkledag
github.com/ipfs/go-datastore
github.com/ipfs/go-ipns
github.com/ipfs/go-bitswap
github.com/ipfs/go-ds-flatfs
github.com/ipfs/go-ds-measure
github.com/ipfs/go-filestore
github.com/ipfs/go-fs-lock
github.com/ipfs/go-graphsync
github.com/ipfs/go-ipfs-cmds
github.com/ipfs/go-ipfs-config
github.com/ipfs/go-ipfs-files
github.com/ipfs/go-ipfs-keystore
github.com/ipfs/go-ipfs-pinner
github.com/ipfs/go-ipfs-provider
github.com/ipfs/go-ipld-git
github.com/ipfs/go-metrics-prometheus
github.com/ipfs/go-mfs
github.com/ipfs/go-namesys
github.com/ipfs/go-path
github.com/ipfs/go-pinning-service-http-client
github.com/ipfs/go-unixfs
github.com/ipfs/go-unixfsnode
github.com/ipfs/interface-go-ipfs-core
github.com/ipfs/tar-utils
github.com/ipfs/go-ipfs
List of those modules and their versions in the go-ipfs
dependency closure (sorted by # versions):
github.com/ipfs/go-detect-race
v0.0.1
github.com/ipfs/go-metrics-interface
v0.0.1
github.com/ipfs/go-ipfs-exchange-interface
v0.0.1
github.com/ipfs/go-ipfs-blocksutil
v0.0.1
github.com/ipfs/go-verifcid
v0.0.1
github.com/ipfs/go-ipfs-exchange-offline
v0.0.1
github.com/ipfs/go-ipld-legacy
v0.1.0
github.com/ipfs/go-ipfs-posinfo
v0.0.1
github.com/ipfs/go-fetcher
v1.5.0
github.com/ipfs/go-ipns
v0.1.2
github.com/ipfs/go-cidutil
v0.0.2
github.com/ipfs/tar-utils
v0.0.1
github.com/ipfs/go-pinning-service-http-client
v0.1.0
github.com/ipfs/go-namesys
v0.4.0
github.com/ipfs/go-mfs
v0.1.2
github.com/ipfs/go-metrics-prometheus
v0.0.2
github.com/ipfs/go-ipld-git
v0.1.1
github.com/ipfs/go-ipfs-provider
v0.6.1
github.com/ipfs/go-ipfs-pinner
v0.1.2
github.com/ipfs/go-ipfs-keystore
v0.0.2
github.com/ipfs/go-ipfs-config
v0.16.0
github.com/ipfs/go-ipfs-cmds
v0.6.0
github.com/ipfs/go-graphsync
v0.8.0
github.com/ipfs/go-fs-lock
v0.0.7
github.com/ipfs/go-filestore
v0.0.3
github.com/ipfs/go-ds-measure
v0.1.0
github.com/ipfs/go-ds-flatfs
v0.4.5
github.com/ipfs/go-ipfs-delay
v0.0.0-20181109222059-70721b86a9a8 v0.0.1
github.com/ipfs/go-ipfs-util
v0.0.1 v0.0.2
github.com/ipfs/go-ipfs-pq
v0.0.1 v0.0.2
github.com/ipfs/bbloom
v0.0.1 v0.0.4
github.com/ipfs/go-ipfs-ds-help
v0.0.1 v0.1.1
github.com/ipfs/go-ipfs-routing
v0.0.1 v0.1.0
github.com/ipfs/go-ipfs-chunker
v0.0.1 v0.0.5
github.com/ipfs/go-unixfsnode
v1.1.2 v1.1.3
github.com/ipfs/interface-go-ipfs-core
v0.4.0 v0.5.1
github.com/ipfs/go-block-format
v0.0.1 v0.0.2 v0.0.3
github.com/ipfs/go-ipld-format
v0.0.1 v0.0.2 v0.2.0
github.com/ipfs/go-ipfs-files
v0.0.3 v0.0.8 v0.0.9
github.com/ipfs/go-unixfs
v0.1.0 v0.2.4 v0.2.5
github.com/ipfs/go-path
v0.0.7 v0.1.1 v0.1.2
github.com/ipfs/go-ds-leveldb
v0.0.1 v0.1.0 v0.4.1 v0.4.2
github.com/ipfs/go-ipfs-blockstore
v0.0.1 v0.1.0 v0.1.4 v0.1.6
github.com/ipfs/go-ipld-cbor
v0.0.2 v0.0.3 v0.0.4 v0.0.5
github.com/ipfs/go-log
v0.0.1 v1.0.2 v1.0.3 v1.0.4 v1.0.5
github.com/ipfs/go-peertaskqueue
v0.0.4 v0.1.0 v0.1.1 v0.2.0 v0.4.0
github.com/ipfs/go-log/v2
v2.0.2 v2.0.3 v2.0.5 v2.1.1 v2.1.3 v2.3.0
github.com/ipfs/go-blockservice
v0.0.7 v0.1.0 v0.1.1 v0.1.3 v0.1.4 v0.1.7
github.com/ipfs/go-cid
v0.0.1 v0.0.2 v0.0.3 v0.0.4 v0.0.5 v0.0.6 v0.0.7
github.com/ipfs/go-ds-badger
v0.0.2 v0.0.5 v0.0.7 v0.2.1 v0.2.3 v0.2.6 v0.2.7
github.com/ipfs/go-bitswap
v0.0.9 v0.1.0 v0.1.2 v0.1.3 v0.1.8 v0.3.4 v0.4.0
github.com/ipfs/go-merkledag
v0.0.6 v0.1.0 v0.2.3 v0.3.0 v0.3.1 v0.3.2 v0.4.0
github.com/ipfs/go-datastore
v0.0.1 v0.0.5 v0.1.0 v0.1.1 v0.3.0 v0.3.1 v0.4.0 v0.4.1 v0.4.4 v0.4.5 v0.4.6
Script I'll use for transferring issues: https://gist.github.com/guseggert/b9622b794b0886d66e8fbf8f234ca709
I'm going to try out moving github.com/ipfs/tar-utils to go-libipfs
, to see if there are any surprises. 'tar-utils' is a good repo to do this with because there's only one version and nothing/nobody except go-ipfs
consumes it, so if we mess something up it's easy to undo.
Another interesting one is go-cidutil
, because it also has a CLI, and I don't know if we can preserve backwards compat with folks doing go get
on a CLI, so I want to find that out.
For now I'm not going to touch go-datastore
and datastore implementations, since they are used by libp2p. We should eventually see if it's feasible to consolidate the interfaces and officially supported implementations into one repo, but we can revisit that after addressing the lower hanging fruit. Right now the focus is on consolidating all the random repos that are close to go-ipfs
.
I don't know if we can preserve backwards compat with folks doing
go get
on a CLI
At the moment, assuming you change the module path, the best you can do is freeze/archive the old repo and just let the existing users keep using old versions. That's as far as we can go in terms of not "breaking" them, even though hiding newer versions from them can be a form of breakage if they use @latest
.
There are middle grounds perhaps, such as publishing one more version in the old module that adds something like:
func init() { println("this tool has moved: go install github.com/new-org/new-repo/cmd/foo") }
The right solution is https://go-review.googlesource.com/c/proposal/+/335849, but that's still a draft proposal right now, so we wouldn't be able to rely on it for at least another 12-18 months until it's shipped in a Go release. What that would get you is full forwarding - users doing go get
or go install
on the old path would be forwarded to the new one transparently.
I'm going to provide the contrarian opinion here, and aim to keep the status quo.
I think the reasons are a bit exaggerated. I am personally fine with the current layout as the purpose and interface of every different repo is very clear. It is possible to follow what happens on every different piece very easily. Dependabot makes the process painless.
Repo maintenance costs include:
- Keeping dependencies up-to-date
- This is non-trivial as it often requires chasing down other dependencies in the dependency graph...mostly we don't do this until we have to
Dependabot makes this trivial
- Releasing new versions as necessary
One commit. Not having to wait until everything is aligned in a huge repo to publish a module release and notify the world about it.
- Making sure CI is still working
Well, single repo or multi-repo, for anything you need to do it is necessary to ensure CI still works. In fact, CI in bigger repos is way worse and tend to be broken way more often (see go-ipfs).
- Migrating from Travis/CircleCI to Actions (still in progress)
Are we in a hurry? I think travis keeps working.
- Rolling out unified CI
Was done already? It is just more lines in a config file and a bot does it?
- Backporting changes across major versions as necessary
We don't do major versions except in a very reduced number of places.
- Manually testing impact of new code changes on downstream consumers
Monorepo or single repo, same responsibility.
- Monitoring issue trackers, PRs, etc.
Absolutely the same. go-ipfs centralizes most issues and many PRs and that doesn't mean they are tended to any better.
- Updating submodules
- Commonly used for testing, example code, etc.
- Often these contain circular module dependencies which complicate propagating breaking changes
They don't. But they did before these repos were extracted from go-ipfs. The fact that these repos are separate is actually an assurance that the dependency graph is sane. At least this is not "often".
go-libp2p did something similar a couple years ago
what libp2p did is way more lightweight than what is proposed here. Libp2p created a "core" repo that contains interfaces and data types, but MOST individual repositories still exist (check your libp2p dependency graph).
I understand that landing on "ipfs" and seeing 50 repos (+ other 50 for libp2p) is a very daunting thing. A big "WTF, how on earth did we get to this". Consolidation may be a perfectly sane thing, but it also causes a few issues that are not there:
- Breaking things
- Unable to track individual releases and changelogs from individual subcomponents
- Module history gets mixed with history from 50 other modules
- Suddenly it becomes way easier to merge modules without thinking well what functionality they offer and what the public interface should be for that. You can see examples of this in repositories that should be further actually broken down further: for example, unixfs, which has unixfs specific stuff, but also chunkers, and dag-builders submodules, which conceptually at least should provide independent functionality.
- Do something like bringing go-datastore, go-ds-flatfs, go-ds-badger together, and suddenly your application has to pull dozens of dependencies to just run a map Datastore (does graph pruning fix that?).
- Applications cannot choose to incorporate new breaking versions of something and update the application to work. They need to wait until the monorepo releases things. And the monorepo can only release things when all the functionality is aligned, even though the actual application does not need all the functionality. See the interdependencies between libp2p-core, libp2p and the different repos. It is impossible to adapt my application to use a new version of a module if go-libp2p and go-libp2p-core have not done it, because I am forced to import go-libp2p and core for everything, and they in turn import everything else and one cannot simply release libp2p or core like you can release a small module. IPFS-land is way more flexible with publication of breaking changes (of course it has some problems that a monorepo doesn't).
I want to think some of the consolidation proposed makes sense, but certainly there is room between going from 50 to 3, and reducing less, or just consolidating types and interfaces in one place (ala libp2p).
Disclaimer: I created dozens of these individual repos and extracted code so that it could be re-used independently, so that building a fully functional IPFS application did not require importing go-ipfs, and you could actually pick and use versions of each module as needed, without being forced by go-ipfs to use a defined set.
I am a huge fan of monorepos.
Right now I can't use the as-a-library example because of cidutil :/
Working on this and I'll share code if I don't make progress. Basically ipfs is perfect for downloading chunked []byte blockchain state.
For visibility, some initial experiments are happening in https://github.com/ipfs/libkubo/pull/1
We did not want to namesquat on "IPFS" with "libipfs", like we did with go-ipfs, but found "libkubo" to be even more confusing since its intention is to have code that is reusable and not Kubo-specific, which its name belies. Since this is a library, not an implementation, we don't think "libipfs" has the same problem that led to renaming "go-ipfs" to "kubo". So we have moved back to "libipfs", you can find the repo here: https://github.com/ipfs/go-libipfs
I have written some tools to ease this migration: https://github.com/guseggert/repo-migration-tools. These tools:
- Move code from one repo to another in a subdir
- Preserving commit history
- Fixing broken merge commit links
- Adding a note at the bottom of each commit with where the commit originally came from
- Transfer GitHub issues from one repo to another, adding the repo as a prefix to the issue name
- Leave a note on open PRs and closing them (since it is not feasible to transfer PRs)
You can find a generic checklist for moving a repo into go-libipfs in the Example Workflow section of the README.
Flagging some where we some thought is going to be needed:
Things that are highly depended on:
- github.com/ipfs/go-cid
- github.com/ipfs/go-datastore
- github.com/ipfs/go-ds-*
- github.com/ipfs/go-log
Things that aren't 'ipfs owned':
- github.com/ipfs/go-graphsync
- github.com/ipfs/go-ipld-format
Agreed ^^ that list was not intended as an official todo list for libipfs, I should have made that more clear.
Here's my proposed list of repos we should definitely migrate, repos that are unclear, and repos definitely not to migrate to go-libipfs
To definitely migrate, in the order of migration:
- github.com/ipfs/interface-go-ipfs-core
- github.com/ipfs/go-unixfs
- github.com/ipfs/go-pinning-service-http-client
- github.com/ipfs/go-path
- github.com/ipfs/go-namesys
- github.com/ipfs/go-mfs
- github.com/ipfs/go-ipfs-provider
- github.com/ipfs/go-ipfs-pinner
- github.com/ipfs/go-ipfs-keystore
- github.com/ipfs/go-ipfs-files
- github.com/ipfs/go-ipfs-config
- github.com/ipfs/go-ipfs-cmds
- github.com/ipfs/go-fs-lock
- github.com/ipfs/go-filestore
- github.com/ipfs/go-bitswap
- github.com/ipfs/go-ipns
- github.com/ipfs/go-blockservice
- github.com/ipfs/go-ipfs-chunker
- github.com/ipfs/go-peertaskqueue
- github.com/ipfs/go-fetcher
- github.com/ipfs/go-ipfs-blockstore
- github.com/ipfs/go-block-format
- github.com/ipfs/go-ipfs-posinfo
- github.com/ipfs/go-ipfs-util
- github.com/ipfs/go-ipfs-pq
- github.com/ipfs/go-ipfs-ds-help
- github.com/ipfs/bbloom
- github.com/ipfs/go-verifcid
- github.com/ipfs/go-ipfs-exchange-offline
- github.com/ipfs/go-ipfs-routing
- github.com/ipfs/go-ipfs-exchange-interface
- github.com/ipfs/go-ipfs-blocksutil
- github.com/ipfs/go-ipfs-delay
- github.com/ipfs/go-detect-race
Repos that I'm unclear about:
- IPLD repos
- These are unclear for me because of unclear maintenance resources, so if some critical change needs to be made to these libraries, it is likely to fall on our shoulders. But we can get huge benefit without touching these anyway, so I'd prefer we kick the can and revisit later.
- github.com/ipfs/go-unixfsnode
- github.com/ipfs/go-ipld-git
- github.com/ipfs/go-merkledag
- github.com/ipfs/go-ipld-legacy
- github.com/ipfs/go-ipld-cbor
- github.com/ipfs/go-ipld-format
- github.com/ipfs/go-cid
- Datastore
- I don't think these should be in go-libipfs, but don't feel strongly and could see the counter-argument too
- github.com/ipfs/go-ds-measure
- github.com/ipfs/go-ds-flatfs
- github.com/ipfs/go-ds-badger
- github.com/ipfs/go-ds-leveldb
- github.com/ipfs/go-datastore
- github.com/ipfs/go-cidutil
- I'd advocate for just forking this into go-libipfs since the code is so small
- Instrumentation
- I'd also advocate for forking these into go-libipfs, we currently run various forks already and it's not very important to keep this consistency across so many unrelated applications
- github.com/ipfs/go-metrics-interface
- github.com/ipfs/go-metrics-prometheus
- github.com/ipfs/go-log/v2
- github.com/ipfs/go-log
Repos that should definitely not be moved into go-libipfs:
- github.com/ipfs/go-graphsync
- We seem likely to remove this in Kubo as it is unused, so no reason to move it into libipfs right now. I do think it actually is a conceptual fit for go-libipfs (for the same reason it was added to Kubo) but practically it doesn't make any sense.
As per discussion in ipfs/boxo#36, please don't move go-merkledag, it would be preferable to wean people off it than lock it in stone with neverending releases as if it's best-practice dagpb.
github.com/ipfs/go-log really should be excluded too, it's used almost universally across all our repos as a generic logger.
The fact that you've moved github.com/ipfs/go-block-format is also pretty disruptive. Ideally we wouldn't be relying on it but it's got a deep dependency tree all over the place and I don't see why it makes sense to absorb it into the mega-repo. (tbqh I think this is all much less than ideal and disruptive for everyone but Kubo).
The end goal of this project is not to benefit Kubo devs, that is just a side effect--we want to lower the barrier to entry so that people will use these libraries and refactor/contribute, instead of using Kubo when it's not appropriate or avoiding the ecosystem altogether. The number of repos and their version inconsistencies is overwhelming, even for some of us who work on them full-time. The cost and risk of bubbling changes around between dozens of repos and versions is so high that even PL folks try to avoid making changes to them, which is not a healthy dynamic. The pain of the refactor that you're pointing out is an example of this.
We believe that putting IPFS things in one place (as much as possible/practical) and treating them as one cohesive product, testing them together, and ensuring version consistency will result in a much better experience for other devs who want to build applications and implementations on top of these libraries. There will be pain as we make this transition, and some ambiguities to work out, but I think the end result for the community and users will be more than worth it.
We believe that putting IPFS things in one place ... will result in a much better experience for other devs
You've gotten concerns from other devs on this transition. What signal are you using to track if this ends up being a better experience for us?
Anecdotally, I can say that the ipfs org is harder for my team (and I suspect non-stewards generally) to work in today than it was 18 months ago.
- Previously as a member of the org I could interact with the repos I needed to. Today, it continues to take days and someone to take pity against my passive-aggressive links to gain access to each new repo created that I need to engage with.
- The mono-repo makes it unclear what code is at what stage. It seems there's a desire for a low barrier of entry to merge, but also a sense that code there is being maintained? For instance I remain unclear on what criteria is used for deciding on ipsl being merged, and the mono repo doesn't help with following that sort of discussion or differentiating stages of development. Am I able to add experimental code in the mono repo, or is that just for the stewards team?
- As mentioned already, loosing the ability to link against tagged versions of libraries my code depends on is a loss in this setup.
@guseggert to quote you from the all-hands today: "Kubo is becoming a kitchen sink" .. so "we're extracting stuff to go-libipfs". If this were all it is then I think that's a laudable goal. But what's going on here that's causing the rest of us pain is that go-libipfs is becoming the kitchen-sink replacement; it's just a shell game of kitchen sinks. By pulling in existing repos, you're building up a DX that's similar to the Kubo UX—it does too much, all in one place, with no opt-out mechanism.
A lot of the components that have been sucked in here are good for one-off use, the libraries were modular and small enough that they could be pulled in for special-purpose tasks that are IPFS-ish or IPFS-adjacent, but don't require everything else. Now you're forcing us to require everything to get simple things done. Want a peertaskqueue? You need all of go-libipfs. Want to manipulate IPFS-compatible paths? Go get all-the-things and you can do that!
As is our style, we have multiple generations of tools / libraries in our ecosystem; we have trouble putting things down and saying goodbye and telling users that that thing isn't supported and that either there's a replacement or it shouldn't be used at all anymore—this is something we need to get better at. Unfortunately, by baking things into an official go-libipfs, you're making it much much harder to retire components. One of my personal gripes is around the previous generation IPLD tooling. I'd really like all of the Block
abstractions to be slowly retired. I think we can do better with our interface to block data and I think our block/data store interfaces could be done much better. A few of us have been actively trying to migrate away from those patterns. By consolidating and etching them in stone here, we're never going to be rid of them and we're edging out room for innovation.
I also think the definition of "IPFS" is a huge problem here, you're essentially saying that it's all of the things in go-libipfs. That's far too expansive and just smells like Kubo's notion of "IPFS". People building their own IPFS on top of go-libipfs basically means building their own Kubo, but perhaps with some features removed. Many of us would prefer that the definition of "IPFS" be much more trim and leave room for significant innovation around it. That's best achieved by having a more decoupled set of components that may or may not be used. I really, really don't want to have to pull in this new beast repo just to get simple things done that might happen to go anywhere near the Kubo-style of "IPFS" and now find myself developing around the deprecation landmines that are being regularly set off.
Another interesting lens to view this through:
What makes a repo in or out of libipfs
? I think the answer that we're ending up with is: if the kubo team primarily maintains it, it's in libipfs.
To me this is a weird place to end up for a "general purpose go library for IPFS"
Bitswap and GraphSync seem like two widely deployed data transfer protocols for content addressed data. But I have to import libipfs
if I want Bitswap while GraphSync is its own repo. Is one more more "IPFS" than the other? What about "go-unixfs" vs "go-unixfsnode" -- both are viable implementations of UnixFS.
This seems like we're ending up with "go-libkubo" despite the original intention for it not to be that. Other libraries are not in this specifically cause it would drag development for the other teams that maintain them. This brings us back to "a thing that is supposed to make DX easier is making it harder".
I believe the missing signal here is new developers in the ecosystem, which are the ones who might possibly be helped by less complex repo structure. @rvagg @willscott and I are all devs who are experienced in the existing structure, so this can only be disruptive for us.
I also wonder though if a big repo rearrange is right approach to "making it easier for new devs". It seems to me that the real barrier to entry for new devs who don't want to just talk to one of the IPFS implementations HTTP APIs is step by step instructions on how to build a functioning IPFS node from the various go libraries. I feel like this could be a first step before a big repo re-org. I'm not going to get very far with go-libipfs if I don't have some step by step code on how to use it.
Either way, it seems like the arbiter is signal from new devs, so I wonder what the arbiter of that should be.
There’s lots here 😅. I appreciate the feedback. It’s been heard by the team and me. I’m going to do my best to reply here. Much correspondence here comes from conversations and notes with @aschmahmann and @guseggert. (Anything useful or well-said should be credited to them; anything foolish or ignorant is on me.) I want to hear all sides. That said, we need to get over this hump as this is a drain on everyone while in limbo. I suspect we’re getting close to needing to disagree and commit.
General comments
The set of people the go-libipfs maintainers plan to be helping here is primarily people trying to build with IPFS that are currently either giving up or relying on the Kubo HTTP RPC API. Some of these people will be better served by IPFS tooling in other languages (Javascript, Rust, Java, Python, …). Still, for those who are either looking to write in Go or to leverage the set of IPFS tooling we already have in Go, we’d like to make their lives easier. We’d also like to make life easier on ourselves as the maintainers by reducing the maintenance burden that comes from being the owners of many repos and then use that time to contribute more to the community in the form of easier-to-use libraries, better implementations, improved protocols, new protocols, etc. Some of those changes will make their way into Kubo and others will not.
None of us (EngRes IPFS Stewards) likes moving repos around for fun. We’re doing this because we spend time in chat channels, forums, in-person events, and engaging either directly or indirectly with companies and hackathon builders operating on our stack. Many of these people find building an IPFS implementation with just the parts they need hard, some have explicitly flagged the many repos as a problem, so we’re going to try making it easier for them.
Last week alone, I know Adin got pulled into multiple conversations around people not understanding how they can pull in the relevant libraries and get going rather than pulling in all of Kubo. Over the years, there have been a few “lite” implementations that try and pull in some of the basic libraries together. However, these tend to be maintained by one person who has a lot of experience with where all the libraries are or were created with the assistance of someone who has. I don’t think this is a scalable solution for supporting many developers trying to build using IPFS.
If somehow, repo consolidation manages to make things worse for both users of our stack and the maintainers, then this endeavor will not have fulfilled its intended purpose. We’re optimistic it’ll do both though.
Anecdotally, I can say that the ipfs org is harder for my team (and I suspect non-stewards generally) to work in today than it was 18 months ago.
Is this because of the repo access permissions delays you mentioned or something else? If it's not related to repo organization (the topic of this issue), then let's cover it in a different forum. (I'd like to learn more.)
Previously as a member of the org I could interact with the repos I needed to. Today, it continues to take days and someone to take pity against my passive-aggressive links to gain access to each new repo created that I need to engage with.
I haven't been tracking SLAs on ipfs/github-mgmt. I'm game to look into this further (but maybe best to raise an issue in that repo). That said, doesn't it support repo consolidation in a minor way? Instead of asking for permissions in many repos, it will be much fewer.
For instance I remain unclear on what criteria is used for deciding on ipsl being merged
Am I able to add experimental code in the mono repo, or is that just for the stewards team?
We intend to follow the merge criteria here: https://github.com/ipfs/go-libipfs#should-i-add-my-ipfs-component-to-go-libipfs
At the moment the policy is:
If you have some experimental component that you think would benefit the IPFS community, we suggest you build the component in your own repository until it's clear that there's community demand for it, and then open an issue in this repository to discuss including it in go-libipfs.
There's been some discussion for alternatives if we want to make it easier to add experimental components (e.g., an experimental subpackage). However, there has been no PR to change the policy. If there's going to be a policy change it'll happen there.
I assume some of the confusion/concern here is that this is a PR in go-libipfs, and there is a "master plan / roadmap" for this functionality showing up in go-lipipfs. I can see how that could be misinterpreted and need clarification. A couple of callouts:
- The PR is not targeting the main branch. The team discussion was that it could live in a separate branch until it meets the merge criteria above.
- I asked to document the plan for how this work would evolve before we got deep into the code.
I think a clarifying step we could take currently is to move this PR and its issues out into a separate repo.
As mentioned already, losing the ability to link against tagged versions of libraries my code depends on is a loss in this setup.
If I understand correctly, as it depends on tagged versions of most of the ipfs/go-*
repos is a bit dicey. The pre-v2 modules (almost all of them) mean that if one of your dependencies drags in something higher, you have to pull it in any way. Realistically, figuring out which versions of libraries to depend on is a pain for anyone not working on these repos daily (and sometimes even for those of us who are).
Versioning together makes dealing with this much easier. Of course, if you need to fork go-libipfs or some subpackage, you're welcome to. If you want to contribute that code back to go-libipfs you're welcome to do that too.
go-libipfs is becoming the kitchen-sink replacement; it's just a shell game of kitchen sinks.
We believe doing this at a library and binary level are pretty different things. A binary level “put everything in here” allows for very little choice in what you support and results in either one-size-misfits-all defaults or unwieldy config files. A repo that has tons of sub-packages is not really that; you can use the ones you want and not use the ones you don’t.
As Gus wrote above, there will still be “Careful consideration of cross-package dependencies.” Just because the packages live together and version together doesn’t mean we want them to all depend on each other.
Want a peertaskqueue? You need all of go-libipfs. Want to manipulate IPFS-compatible paths? Go get all-the-things and you can do that!
Aside from whether that particular repo should be moved, what are you concerned about here, binary bloat? It seems like Go should mostly avoid that now with module pruning + lazy loading.
we have trouble putting things down and saying goodbye and telling users that that thing isn't supported … you're making it much much harder to retire components
Yes, generally the more users we have dependent on a given chunk of code the more we try to avoid breaking them. Generally speaking, when we are able to retire components or make breaking changes to them it comes with an effort around communication and making upgrade paths doable.
This happened with the go-ipfs-blockstore changes (dropping v0 support and breaking changes around context plumbing). It was a pain for many people depending on them, but the status quo was painful too, so we made the changes, bubbled them up and communicated with people about when the changes would be coming. A lot of the plumbing there was pretty miserable and the number of repos we had to communicate around was painful too. With a smaller number of repos these kinds of changes would be easier to execute and communicate about.
One of my personal gripes is around the previous generation IPLD tooling
Jorropo’s proposal to bring in go-unixfs was to tag it with the “style” of the module ipfs/boxo#36. I’d hope over time we’d only feel the need to support one and take the best of what we need from each and help users migrate over, but if they’re sufficiently different and important to the community then we can have multiple in there.
By consolidating and etching them in stone here, we're never going to be rid of them and we're edging out room for innovation.
I’m not sure what this means. As before we’re going to keep maintaining existing code and working to make things better, which might include migrations and breaking changes.
Is a chief concern here the repo name “ipfs/go-libipfs”? Is the feedback that this should be named something different? Gus posted this #8543 (comment) in November 2022 with no comments to the contrary and so life has moved on.
To be clear, I’m fully supportive if Bedrock (or any other team) would like to make a ipfs/go-greatipfslib repo rather than contribute to go-libipfs. It can be listed on docs.ipfs.tech’s list of implementations.
I also think the definition of "IPFS" is a huge problem here, you're essentially saying that it's all of the things in go-libipfs.
I’m going to sidestep the issue of the definition of IPFS. We’ll update the README to be clear that IPFS != all the things in go-libipfs, but rather if you’d like to build an IPFS implementation here are some tools you might want that are maintained by a group that has long-term commitments to the IPFS project.
The fact that some of the repos Bedrock maintains and doesn’t want included here (e.g. go-car, go-graphsync, most things IPNI related) are useful to IPFS implementations is fine. Similarly, if someone uses Rust to make a UnixFS implementation that could be used through Go (via FFI or WASM) that’s cool too and absolutely doesn’t need to be part of go-libipfs (and likely shouldn’t).
Our goal is to help people build things. Right now they can’t find anything or figure out how to use what they do find so they run kubo and use its HTTP RPC API… We’d like them to be able to do better. Taking the libraries they were already effectively relying on in production and making them more easily discoverable and usable is one way we’re trying.
I really, really don't want to have to pull in this new beast repo just to get simple things done that might happen to go anywhere near the Kubo-style of "IPFS" and now find myself developing around the deprecation landmines that are being regularly set off.
We might have to diverge here, but what are you thinking is going to go wrong here? (I want to make sure we’re knowingly entering risks here.) However, if you find it easier to fork and maintain a subset of functionality or rewrite things in an alternative style that’s fine. If you want that code to be usable with anything that already exists you’ll need some bridging code (e.g. like the go-ipld-prime storage adapters) and if not then you won’t.
What makes a repo in or out of
libipfs
? I think the answer that we're ending up with is: if the kubo team primarily maintains it, it's in libipfs. To me this is a weird place to end up for a "general purpose go library for IPFS"
With being on the front lines of support issues, user conversations, etc. and handling much of the maintenance of these repos, we have a perspective on what will make users’ lives better. I admit we are certainly giving preference currently to the repos that we believe help users and Kubo maintainers. I’m not going to claim this ensemble of repos is perfect and feedback welcome. I do want to make sure it’s understood though that this isn’t being formed in a vacuum.
Bitswap and GraphSync seem like two widely deployed data transfer protocols for content addressed data. But I have to import
libipfs
if I want Bitswap while GraphSync is its own repo.
Yeah, you’re right. Per above, libipfs isn’t exhaustive, and [we’ll update the docs](ipfs/boxo#171) to make this more clear. Per Gus comment, there isn’t any opposition to graphsync being in libipfs in principle. But given the only folks who have signed up for go-libipfs maintenance don’t maintain go-graphsync currently and because IPFS Stewards haven’t encountered users requested how to pull it in to solve their problem, I agree with Gus that it doesn’t makes sense to include currently.
I believe the missing signal here is new developers in the ecosystem, which are the ones who might possibly be helped by less complex repo structure. @rvagg, @willscott, and I are all devs who are experienced in the existing structure, so this can only be disruptive for us.
I agree this is key. I realize this issue description doesn’t make that clear as this effort was originally motivated from the maintenance drain for the EngRes IPFS Stewards team. It was fueled and escalated for us though as:
- we saw the repeated challenges users were running into (discussed in previous comments) AND
- team leaning into the expectation put on them to better and support the IPFS implementer community as a whole (rather than just the Kubo implementation)
I also wonder though if a big repo rearrange is right approach to "making it easier for new devs". It seems to me that the real barrier to entry for new devs who don't want to just talk to one of the IPFS implementations HTTP APIs is step by step instructions on how to build a functioning IPFS node from the various go libraries. I feel like this could be a first step before a big repo re-org. I'm not going to get very far with go-libipfs if I don't have some step by step code on how to use it.
Not complete, but this has been the intent of the go-libipfs examples. We make sure these pass in CI. Some gateway functionality is covered now, and we plan to expand this as as a more scalable mechanism when handling IPFS support inquiries. Contributions are also welcome here.
Closing thoughts
Here are some actions I believe should be taken:
- Move relevant parts of this discussion into the go-libipfs README. I have started this here and I have asked the go-libipfs maintainers to finish this off.
- @guseggert to formalize the list of repos that will be migrated. We have this comment here but I think we should ensure there is internal alignment and get it into the issue description for clarity. Tracking item: ipfs/boxo#174
- @Jorropo Move out the RAPIDE PR and issues to be clear. (I don’t know how much net gain this has at this point for the time involved, but it seems right given some of the confusion it caused.)
- If there is more that concerned parties (Bedrock or other) want to raise, I think we should move to a synchronous time this week (week of 2023-02-20) to get commitment. I can schedule this. I want to avoid the long turnaround times and hours tied up in responding.
Thanks again all for the input here. Your patience with reading, engaging, and dealing with the fallout are appreciated.
Give a 👍 here if you'd like to be involved a synchronous closeout this week.
Is there a realistic outcome that consists in "leaving things as they were"? Because if I'm reading it right, that is the request from comments above: to reconsider this effort.
I don't think we can hide behind the "we're doing this for the developers" in a way that holds together. We have (had*) a consistent approach, which was "every distinct component lives in its repository" and we had a consistent developer workflow (open PR, get approval if needed, merge, release the component separately). It was not perfect but it was consistent and most flexible. The original issue description does not mention the topic of "supporting developers", because that was not what was driving the rationale of the change (rather it was the code maintenance effort paid by Stewards). The best way to help users and developers is mostly uncontroversial and consists in more, better and improved documentation.
Move out the RAPIDE PR and issues to be clear.
I don't think it's worth it, the issues were a mistake on my part, however I don't see what moving them now that everyone got spammed with notifications will do. (I'll if anyone insist)
About the PR, it is not targeting master, I don't see what should be done here, should I make a go-libipfs-rapide
fork inside the ipfs org ?
Circling back to this quickly. The team has been tied up with other release and operational events this week.
Is there a realistic outcome that consists in "leaving things as they were"? Because if I'm reading it right, that is the request from comments above: to reconsider this effort.
The request to reconsider is heard. Leaving things as they were isn't an outcome we're considering currently.
But... before doing more disruption here there are at the minimum more communication and project management steps to do including:
- Update the top-level description with the condensed summary of more of the analysis that was done to push the maintainers in this direction and why this has been getting acted upon in the last few months despite being up for a year prior. Some of this is amongst the comments above, but warrants being stated coherently in one place. This includes speaking to alternatives for helping developers like the suggestion of "more, better and improved documentation"
- ipfs/boxo#174
- ipfs/boxo#170
There are also additional items related to better project maintenance/hygiene after Kubo 0.19 ships:
- ipfs/boxo#180
- ipfs/boxo#175
- ipfs/boxo#181
- (more that get added will show up under this label)
I understand that delayed responses and communication here aren't ideal. The offer to go-back-and-forth verbally still stands too in the interim.
2023-02-22 update (that was also in FIL Slack):
Kubo maintainers aren't going to do more here until we have our ducks in a row, have communicated more, and have confidence that we can pull this off with our current staffing.
We have been slowed down in making improvements last week and this due week to operational events and focusing on this week's Kubo release.
We'll return to this more next week.
If there is a dependency or access issue in the short term we can help address or debug, please share.
More communication next week.
Is there a realistic outcome that consists in "leaving things as they were"? Because if I'm reading it right, that is the request from comments above: to reconsider this effort.
I don't think we can hide behind the "we're doing this for the developers" in a way that holds together. We have (had*) a consistent approach, which was "every distinct component lives in its repository" and we had a consistent developer workflow (open PR, get approval if needed, merge, release the component separately). It was not perfect but it was consistent and most flexible. The original issue description does not mention the topic of "supporting developers", because that was not what was driving the rationale of the change (rather it was the code maintenance effort paid by Stewards). The best way to help users and developers is mostly uncontroversial and consists in more, better and improved documentation.
We have had consistent feedback from users and PL new hires that the sheer volume of repos and effort required to plumb changes around is immediately off-putting. I was one of those new hires when I created this issue, there have been others with the same sentiment, there are users who contribute with the same sentiment, and there's likely a large group of people who would have contributed or built on this ecosystem but don't because the cost is so high. I'm not sure how addressing this issue is not "for the developers" ? Sure it may not be for all developers but there is a clear signal from many folks that they don't like working with the existing setup.
There is no more documentation, there are still dozens of modules (even in the same repo) and understanding things is complex. Plumbing a change that touches everything will be easier, but the contribution flow for people like me or anyone that uses modules separately in mix&match&fork fashion is much worse.
there are users who contribute with the same sentiment, and there's likely a large group of people who would have contributed or built on this ecosystem but don't because the cost is so high
Not only this is an assumption, but also it doesn't take into account that the people that are OK with how things were is not reaching out to remind you that they were happy.
This is a very complex change, with uncertain outcome, based on assumptions and personal preferences, that alters the status quo and has been contested. These are all flags that it is not an endeavor to take in a world with much much lower hanging fruit and more pressing matters when it comes to the adoption of our tech. I sincerely hope I am wrong and hope to see an uptick in contributions from the community in the coming months. ☮️
2023-03-09 update:
- Given some of the feedback in this repo and elsewhere, the @ipfs/kubo-maintainers aren't planning to take make as disruptive of a change as before. Rather than fully "migrating" the repos, we'll copy them into go-libipfs with deprecated types and a "not maintained" readme message. This is covered more in ipfs/boxo#191 (comment) . Feel free to read more there.
- The planning for this consolidation-via-copying is being planned and tracked in ipfs/boxo#196 . There will be more activity on this endeavor week of 2023-03-13. For those interested, please follow along that issue or the issues therein.
- This specific issue will get closed out once relevant portions of this issue are extracted out into docs/issues within go-libipfs (e.g., ipfs/boxo#190 ). This will happen week of 2023-03-13.
Closing this as the work is happening in ipfs/boxo and relevant docs have been updated there including: