Sharing IPFS deamon (by default) across all the embedders
Gozala opened this issue · 17 comments
Problem
At the moment I have 3 IPFS nodes running on my machine embedded by following applications
And when I'll start test driving anytype I'll likely end up with 4 nodes.
It is not ideal, because:
- IPFS is resource intensive (On my Macbook Pro IPFS Node drains battery about 5 times faster than without). That multiplied by 3x (or possibly more as more apps become popular) is terrifying.
- As user I would like a canonical place for the data stored by apps to be aggregated at, instead it gets scattered around bunch of IPFS nodes. That also means there is no simple way for me to mirror data on other nodes (sever or other device). More broadly current situation does not align with WebOS vision.
- We are replicating data silos that we hate from the conventional web, sure data isn't gated behind corporate server farm but if user can't easily interact with data trapped in other app embedded node end result is not much different. _(With the caveat that here you can copy dataset across nodes, but propagating changes across nodes is non trivial & apps generally don't have a good story even for doing copying).
Proposal
ipfs daemon
and ipfs
tool and library should by default should attempt to discover if IPFS service is already running and if so just act as a frontend to it, otherwise spawn a service and still act as a frontend to it. That would make all embedders by default share same IPFS node. Dealing with version incompatibilities is going to be an interesting challenge, but even if this would address problem for same versioned nodes that would be a good step forward IMO.
Created issues for radicle radicle-dev/radicle-alpha#677
Created issue for Textile https://github.com/textileio/photos-desktop/issues/37
I also just verified that data does not seem to be accessible across my local nodes, specifically IPFS Desktop IPLD explorer does not seem to manage to resolve CID for the content in my textile IPFS node, which is surprising as I would imagine it would at least manage to do it across the public gateway -> textile cafe, but even then it would be ridiculous to have to go through the cloud to access blocks that are in the same device.
I believe go-libp2p-daemon aims to address some of that:
A standalone deployment of a libp2p host, running in its own OS process and installing a set of virtual endpoints to enable co-local applications to: communicate with peers, handle protocols, interact with the DHT, participate in pubsub, etc.
ROADMAP.md mentions relevant medium-term goals:
- Multi-tenancy, one application = one identity = one peer ID.
- app <> daemon isolation; trust-less scenario; programs should not be able to interfere or spy on streams owned by others.
Mentioning @bigs @raulk @Stebalien @vyzo here in case plan changed or anything related could be linked in this issue
I also just verified that data does not seem to be accessible across my local nodes, specifically IPFS Desktop IPLD explorer does not seem to manage to resolve CID for the content in my textile IPFS node
I agree, if nodes share DHT, then when running on the same host they should see each other immediately. IIRC we don't have any other mechanism for local discovery than "MDNS" (which is an issue on its own):
- "MDNS" might be disabled in config of some of nodes you use. It should be
Enabled
:$ ipfs config --json Discovery.MDNS { "Enabled": true, "Interval": 10 }
- "MDNS" discovery did not work in some contexts in go-ipfs 0.4.21: ipfs/kubo#6359 (try 0.4.22-rc1 or later)
I agree with @lidel in that the go-libp2p-daemon
is meant to address this issue but I believe applications don't need to embed go-ipfs
and they should attempt to see if there is an already running instance on the machine before starting their own. As a counter point, IPFS Cluster takes the opposite approach of requiring the user to have go-ipfs
already running but you can start any number of Cluster peers on the same machine and they will all use the same go-ipfs
daemon by default.
@jkarni from radicle shared some interesting thoughts on the subject
I agree it's not ideal, but our experiments indicated that joining the main network (which as I understand is still currently the only way to have a single daemon) substantially decreases performance.
I haven't profiled IPFS to know where the baseline resource usage is coming from, but if it's from maintaining connections and discovering peers, I'm also a little skeptical of a unified IPFS daemon (which is what I presume the go-ipfs-daemon comment in the linked thread is about) wouldn't either decrease performance, or end up with roughly the same resource usage as independent IPFS daemons, since the peers/connections radicle IPFS wants are ultimately a largely disjoint set from the rest of IPFS. (E.g., radicle doesn't need to know about DHT items for anything that's not radicle related, and for the most part won't retrieve IPFS data from anything that wasn't put there by another radicle instance.)
If sharing node negatively affects performance that suggests that node is doing a poor job at prioritizing tasks. If node is conceptually idle, as in, it's not acting on other applications behalf it should be making submitted task higher priority that just being idle peer in the network.
I think argument about disjoint pinsets is interesting and sound, however IPFS becomes integrated into more and more tools we use daily I don't think it will hold as there will be more overlap across peers we converse with.
And if we consider that one might wish to mirror all of own data somewhere (for backup or uptime) likely that mirror node will need to connect to all the peers and sharing connections would be more optimal resource management.
applications don't need to embed
go-ipfs
and they should attempt to see if there is an already running instance on the machine before starting their own.
If most applications should do that seems like a good argument in favor of making this a default behavior of go-ipfs which they all embed. Otherwise everyone needs to solve the same problem.
That is something that they have to do on their side and can't be built into go-ipfs
.
That is something that they have to do on their side and can't be built into
go-ipfs
.
Can you elaborate a bit on this please ? I'm happy to challenge "can't be built", but I suspect you don't meant to say technically impossible but rather some specific constraint which I may be unaware of.
If I may interject. The Dat community is currently going through some similar developments.
We're planning on having a high level API provided to developers which will automatically spawn a daemon behind the scenes and talk to it over RPC.
We might use some of it ideas from the auto-daemon node module for this use.
A big worry we have is how people will deal with different versions of the daemon. In the Secure Scuttlebutt community some applications opt to embed the daemon into themselves so that they can take advantage of new features without being blocked by other applications, for example.
Breaking changes are a pretty big deal when it comes to this stuff, too.
https://github.com/andrewosh/hyperdrive-daemon
https://github.com/andrewosh/hyperdrive-daemon-client
A big worry we have is how people will deal with different versions of the daemon. In the Secure Scuttlebutt community some applications opt to embed the daemon into themselves so that they can take advantage of new features without being blocked by other applications, for example.
Flow type-checker does something that I find to be a reasonable compromise. At startup it checks if service process is running. If server and lunched flow instance are compatible (same version) it just connects to it. If there is a mismatch it offers a user a choice to either use different version of flow (as one ruining) or abort (I think offer to just run embedded one might be a better option).
That is something that they have to do on their side and can't be built into
go-ipfs
.Can you elaborate a bit on this please ? I'm happy to challenge "can't be built", but I suspect you don't meant to say technically impossible but rather some specific constraint which I may be unaware of.
I mean if you have a go-ipfs
daemon running on your machine, and then start the Textile application, is it the responsibility of the go-ipfs
daemon to detect that Textile has a dependency on it, and somehow make the Textile application use itself instead of spinning up its own embedded daemon?
I think it would reasonable to write a library that applications can use to detect a running go-ipfs
daemon and if not found start one but that will only help if there is in fact a running daemon, if we assume both Textile and Radicle were using this library, and there wasn't a running go-ipfs
daemon, they would both end up starting their own embedded daemon.
Also, there is a bunch of issues if each application wants a differently configured daemon, which currently can only be solved by running a separate daemon.
I believe both textile and radicle just spawn go-ipfs
, if go-ipfs
process attempted to detected running daemon and connected to it as client (or spawned a daemon and retried detection) by default all users spawning go-ipfs
would just shared the daemon.
That could be another library, however I’m inclined to think that making it part of go-ipfs and make it default behavior (with opt out flag) would be a better out of the box configuration.
Yes it’s not trivial as configurations may not be compatible, same as daemon versions and likely some isolation per embedder will require thinking and coming up with a defaults with reasonable compromises. But I think that needs to be done either way. If IPFS will ever be a native component of the OS distribution or part of browser where to share and where to separate will have to be figured out. And longer we ignore harder it will be to migrate growing number of users
As of today, IPFS has support for running as a daemon. With some environment variables, the IPFS CLI will connect to that existing daemon via the HTTP API. All an application wishing to re-use an existing IPFS daemon needs to know is the URL for its HTTP endpoint. Applications wishing to reuse IPFS daemons--a good idea for situations when, as @lanzafame hinted, the applications are okay sharing a peer identity and configuration options--should optionally accept a URL for the daemon (or a path to the IPFS configuration, if you wanted.) This is all doable today.
If you wish to have multiple peer identities within a single process, things get more complicated. We've talked about doing this in the libp2p daemon, and are interested in the idea for testing purposes, but don't have a definite timeline for it.
This issue seems related. https://github.com/ipfs-rust/rust-ipfs/issues/136
It discusses some problems with the pin api when used by multiple applications. Some of those issues are:
- what happens when multiple applications pin the same block and then one unpins it
- what happens when an application gets deinstalled
- how can read/write operations on the blockstore be decentralized (not needing to go through a http api or socket as that would be very slow if all applications used ipld as their main data store)
I don't have answers but would like to share some notes / ideas I had around this. I also would be spending time in coming months on reducing resource usage of js-ipfs in browser context so that every single tab does not spin new node with separate connections etc... and this is highly relevant to me.
- what happens when multiple applications pin the same block and then one unpins it
- what happens when an application gets deinstalled
This are good questions. I think a lot can be learned from how web browsers are going about these things. While there is no exact equivalent of pinning there resource cashing layer and multiple origins (web pages) can link to.
I would suggest that application de-install is somewhat similar to closing a browser tab. Browser does not know if that website will ever going to be accessed again and it uses certain heuristics to do cache eviction.
Browsers are also use some user interactions to guide those decisions. e.g. web page can store data into indexdb but it's subject to cache eviction. On the other hand webapp can request to keep storage permanent through API on which browser will obtain user consent that both informs user and guides decisions what data is not subject for eviction.
- how can read/write operations on the blockstore be decentralized (not needing to go through a http api or socket as that would be very slow if all applications used ipld as their main data store)
It is worth looking at what sqlite does to enable multiple process work with the same data base (quoting for convenience)
Can multiple applications or multiple instances of the same application access a single database file at the same time?
Multiple processes can have the same database open at the same time. Multiple processes can be doing a SELECT at the same time. But only one process can be making changes to the database at any moment in time, however.
SQLite uses reader/writer locks to control access to the database. (Under Win95/98/ME which lacks support for reader/writer locks, a probabilistic simulation is used instead.) But use caution: this locking mechanism might not work correctly if the database file is kept on an NFS filesystem. This is because fcntl() file locking is broken on many NFS implementations. You should avoid putting SQLite database files on NFS if multiple processes might try to access the file at the same time. On Windows, Microsoft's documentation says that locking may not work under FAT filesystems if you are not running the Share.exe daemon. People who have a lot of experience with Windows tell me that file locking of network files is very buggy and is not dependable. If what they say is true, sharing an SQLite database between two or more Windows machines might cause unexpected problems.
We are aware of no other embedded SQL database engine that supports as much concurrency as SQLite. SQLite allows multiple processes to have the database file open at once, and for multiple processes to read the database at once. When any process wants to write, it must lock the entire database file for the duration of its update. But that normally only takes a few milliseconds. Other processes just wait on the writer to finish then continue about their business. Other embedded SQL database engines typically only allow a single process to connect to the database at once.
However, client/server database engines (such as PostgreSQL, MySQL, or Oracle) usually support a higher level of concurrency and allow multiple processes to be writing to the same database at the same time. This is possible in a client/server database because there is always a single well-controlled server process available to coordinate access. If your application has a need for a lot of concurrency, then you should consider using a client/server database. But experience suggests that most applications need much less concurrency than their designers imagine.
When SQLite tries to access a file that is locked by another process, the default behavior is to return SQLITE_BUSY. You can adjust this behavior from C code using the sqlite3_busy_handler() or sqlite3_busy_timeout() API functions.
I think that quote also points out interesting point about large database engines that have client/server architecture that seem to have a pretty high throughput so maybe something can be learned from there as well.
While there is no exact equivalent of pinning there resource caching layer and multiple origins (web pages) can link to.
This works because it's just a cache. Imagine a block from the kernel or systemd is garbage collected, which you need to boot your system. It is not unreasonable that this could happen in the future and it's on the long term roadmap for ipfs.
It is worth looking at what sqlite does to enable multiple process work with the same data base (quoting for convenience)
This only works if all applications are both correct and trustworthy. How do you prevent an application from corrupting the database or removing the data from other applications?
I think I outlined a possible solution in the issue I linked that could work. And be made reasonably performant. Please punch holes into it if you can think of some!