Awesome Endeavour: Async Iterators
alanshaw opened this issue Β· 47 comments
JS IPFS supports two types of stream at the API level, but uses pull streams for internals. If I was working on js-ipfs at the time I'd have made the same decision. Since then, async/await became part of the JS language and the majority of JavaScript runtimes now support async/await, async iterators and for/await/of (i.e. no need to transpile). These tools give us the power to stream data without needing to rely on a library.
Just because there are new language features available doesn't mean we should switch to using them. It's a significant upheaval to change the core interface spec and its implementations (js-ipfs, js-ipfs-api etc.) without good reason.
That said, it has become apparent that there are a growing number of good reasons to do this:
- Reduction in bundle size - no need to bundle two different stream implementations, and their eco-system helper modules, no need for the
async
module. - Reduce
npm install
time - fewer dependencies to install. - Allows us to remove a bunch of plumbing code that converts Node.js streams to pull streams and vice versa.
- Reduces API surface area, no
addPullStream
,addReadableStream
. - Building an
interface-ipfs-core
compatible interface becomes a whole lot easier, no dual promise/callback API and no multiple stream implementation variations of the same function. It would also reduce the number of tests in theinterface-ipfs-core
test suite for the same reasons. - Node.js readable streams are now async iterators thanks to #17755!
- Of note, it is trivial to convert from pull stream to (async) iterator and vice versa.
- Unhandled throws that cannot be caught will no longer be a problem
- Better stack traces, stacks no longer clipped at async boundaries,
await
stack traces better than promise stack traces - A modern, up to date and cutting edge API will aid community contributions and adoption.
The rough plan is:
- Drop support for dual callback/promise based APIs
- Expose only APIs that return promises or iterators for async actions
- Use async/await over then/catch when dealing with promises
This will require significant discussion and coordination from the JS teams. We'll need to reach agreement on the best API to expose for each module and manage releases carefully.
Below is a table documenting the multiformats, libp2p, ipld and ipfs modules that will likely need work. I suspect that some of these modules can be removed as they do not expose an async API. Likewise there's probably modules that got missed. If you notice either way then please edit the table or comment below.
If you'd like to own this enhancement task for a module then please comment below (or add yourself to the table if you know what you are doing). Please open a PR against the module asap (does not have to be anywhere near complete!) so we can add it here also and track progress.
- π = Not started
- π = In progress
- π = Complete
Core
Multiformats
- Current progress:
- π 2/2
- π 0/2
- π 0/2
Module | PR | Owner | Status | Priority |
---|---|---|---|---|
multihashing-async | multiformats/js-multihashing-async#37 | @hugomrdias | π | P0 |
multistream-select | https://github.com/multiformats/js-multistream-select/releases/tag/v0.15.0 | @alanshaw | π | P1 |
libp2p
- Current progress:
- π 25/30
- π 3/30
- π 2/30
IPLD
Please read #1670 (comment) before contributing.
- Current progress:
- π 8/8
- π 0/8
- π 0/8
Module | PR | Owner | Status | Priority |
---|---|---|---|---|
ipld | ipld/js-ipld#190 | @vmx | π | P3+ |
ipld-bitcoin | ipld/js-ipld-bitcoin#48 | @vmx | π | P3+ |
ipld-dag-pb | ipld/js-ipld-dag-pb#124 | @vmx | π | P1 |
ipld-dag-cbor | ipld/js-ipld-dag-cbor#107 | @vmx | π | P1 |
ipld-ethereum | ipld/js-ipld-ethereum#51 | @vmx | π | P1 |
ipld-git | ipld/js-ipld-git#51 | @vmx | π | P1 |
ipld-raw | ipld/js-ipld-raw#32 | @vmx | π | P1 |
ipld-zcash | ipld/js-ipld-zcash#39 | @vmx | π | P1 |
IPFS
- Current progress:
- π 19/19
- π 0/19
- π 0/19
Dependents
These modules use IPFS and fall under the ipfs/ipfs-shipyard umbrella so should also be updated.
Current progress:
- π 1/44
- π 2/44
- π 40/44
Module | PR | Owner | Status | Priority |
---|---|---|---|---|
awesome-ipfs | π | |||
benchmark-js.ipfs.io | π | |||
cid-utils-website | π | |||
demo-ipfs-todo | π | |||
ipfs-companion | @lidel | π | ||
ipfs-desktop | π | |||
ipfs-fuse | @alanshaw | π | ||
ipfs-geoip | ipfs-shipyard/ipfs-geoip#67 | @nijynot | π | P0 |
ipfs-iiif-db | π | |||
ipfs-level | π | |||
ipfs-blob-store | ipfs-shipyard/ipfs-blob-store#26 | @niinpatel | π | P3+ |
ipfs-locations | π | |||
ipfs-performance-profiling | π | |||
ipfs-pubsub-peer-monitor | π | |||
ipfs-pubsub-room | π | |||
ipfs-pubsub-room-demo | π | |||
ipfs-redux-bundle | π | |||
ipfs-registry-mirror | π | |||
ipfs-postmsg-proxy | @alanshaw | π | ||
ipfs-pubsub-1on1 | π | |||
ipfs-service-worker | @vasco-santos | π | ||
ipfs-share-files | π | |||
ipfs-stats | π | |||
ipfs-webui | π | |||
ipfsd-ctl | ipfs/js-ipfsd-ctl#353 | @achingbrain | π | |
ipld-explorer | π | |||
ipld-explorer-cli | @alanshaw | π | ||
ipld-explorer-components | π | |||
ipscend | π | |||
npm-on-ipfs | @achingbrain | π | ||
peer-crdt-textarea-binding | π | |||
peer-flipchart | π | |||
peer-pad-core | π | |||
peer-star-app | π | |||
peer-star-network-vis | π | |||
peer-star-network-vis-react | π | |||
peer-star-peer-color | π | |||
peer-star-react | π | |||
peerpad-peer-crdt | π | |||
service-worker-gateway | π | |||
tevere | π | |||
window.ipfs-fallback | @alanshaw | π | ||
y-ipfs-connector | π | |||
ipld-graph-builder | π | P3+ |
This is a "Everyone is Welcome" effort and a excellent opportunity for everyone to get their feet wet on the codebase, improve documentation, testing and refresh the code.
Hoping that we can count with help from the IPFS GUI, IPFS in Web Browsers, Community WG and Dynamic Data & Capabilities. This will make our JS APIs better for all the modern/new JS developers!
@ipfs/javascript-team Unite β¨ππ½β‘οΈ
this is awesome and makes me so happy to see π
add me to multihashing-async, ipfsd-ctl, ipfs-multipart
@hugomrdias you got it! :)
Updated the table with my assignements
@alanshaw I'm taking a stab at libp2p/js-peer-id#87 π
Update: I added myself to peer-info
too.
Thoughts and views on https://github.com/libp2p/interface-peer-discovery/issues/2 please!
@alanshaw I was thinking at taking a look at ipld-*
and I noticed interface-ipld-format is missing from the table. Despite not being code, it is also required to be updated. I'll take a look at that one, does it make sense?
interface-ipld-format
: ipld/interface-ipld-format#47ipld-zcash
: ipld/js-ipld-zcash#28ipld-raw
: ipld/js-ipld-raw#21ipld-bitcoin
: ipld/js-ipld-bitcoin#32
RE: IPLD
We (@vmx and I) aren't too happy with interface-ipld-format
and have been talking about a re-write for a some time now. Originally we had intended to use async functions as a test case "before advocating that all of js-ipfs moves to it" but you're beating us to it now :)
Perhaps we should accelerate the timeline for this refactor rather than trying to update the current interface?
Yeah, that would be awesome. @mikeal @vmx as you can see by my previous comment, I already created a PR on interface-ipld-format
. Maybe we could continue the discussion there!
it would be great if you, @vmx and @hacdias could work together on that refactor/redesign
Sounds great, but I do want to make sure I'm prioritizing things in the right order (there's a lot on my plate from being out on vacation and now I'm sick). Is this refactor already blocking downstream refactors?
@mikeal as far as I know, it isn't since no one started refactoring IPLD nor anything that depends on it.
Just a note that we should use this opportunity to remove the object API as per the plan here ipfs-inactive/interface-js-ipfs-core#388 (comment)
If we're removing redundant/superceded APIs, there are some more suggestions here: https://github.com/ipfs-shipyard/ipfs-http/issues
i'll pick up ipfs-repo
Hi I'm new π I'll pickup ipld-git
ipld/js-ipld-git#38
I do have a question. The entire library is sync with setImmediate
s for most callbacks. With async/await you are free to await on a value that isn't a promise with no ill effect. Should ipld-git
return promises to keep a spec? or will consumers of the ipld-*
interface be using async/await and therefor tolerate a non promise values?
Thanks!
@reconbot awesome! The IPLD refactor is going to be a bigger effort than a simple switch to async/await and any PRs you make should be done in the context of ipld/js-ipld#185. @vmx is the champion for JS IPLD. @vmx, do you have a moment to focus @reconbot's energies in the right places?!
As @alanshaw mentions, there's bigger changes in the IPLD API (ipld/js-ipld#185) and IPLD Formats API (ipld/interface-ipld-format#50) upcoming. I'm actively working on those. Once the new IPLD API is there (I'm working on it), we can tackle the IPLD Formats one.
To all contributors. Please hold on for a moment and don't spend time on making the IPLD Formats Promise/async-await based. I'll post an update here, once it makes sense to work on it.
No worries it was worth it for me to dive in and figure it out. All things considered it was a pretty small straightforward change. And I figured thereβs a huge unwritten dependency tree to the rewriting process.
I was looking at some of the libp2p stuff next but I imagine itβs got a similar api redesign going on?
We'll be evaluating the js-libp2p api itself, and likely js-libp2p-switch as a result of that. There are some of the interface modules that are a blocker for the transports that we should be able to get done though. Any potential api changes there shouldn't be a blocker for moving them to async. In particular: interface-connection
, interface-stream-muxer
, and interface-transport
. They weren't previously in the table, so I added them. Those would help free up work for many of the other libp2p modules.
Hi, just refactored ipfs-geoip
(ipfs-shipyard/ipfs-geoip#67).
I'll try to take on ipfs-bitswap
too if that's fine.
For all working on this endeavour. In case you refactor parts which is still using the async
module, e.g. a waterfall
, you need to be careful.
I thought I can replace a call like
(files, callback) => ipld.get(β¦, callback),
with (where first()
returns a Promise):
async (files) => ipld.get(β¦).first(),
This works in Node.js, but it won't work with our Browser build as we transpile to older ES versions.
Instead you need to resolve the Promise like this:
(files, callback) => ipld.get(β¦).first().then(
(node) => callback(null, node),
(error) => callback(error)
)
Update: You can also use asyncify
(see https://caolan.github.io/async/global.html for more information):
const asyncify = require('async/asyncify')
β¦
asyncify(async (files) => ipld.get(β¦).first()),
If you want to know if your current code works in the Browser you need to run the tests in production mode:
NODE_ENV=production npm run test
(this needs ipfs/aegir#325 to be fixed).
Credit for finding this issue goes to @achingbrain as he mentioned the transpiling issue during code review.
Please make an issue in aegir about this, with a generic repro and pointing to terser as the problem. This is probably fixable.
BTW next release aegir will have minification in the karma tests.
Following is a list of modules that I created (that probably don't exist in on npm otherwise) and needed in pursuit of this endeavour. They might be useful to you too π:
- Parallel mapping for async iterators
- ndjson as an async iterator
- Convert a (async) iterator to a pull stream
- Convert a pull stream to an async iterator
- If an iterator errors, restart and continue
- Make any iterator or iterable abortable via an AbortSignal
- Get the default iterator or async iterator for an Iterable
- Convert async iterator to event emitter
...and some that I found that I didn't have to write that were useful:
- Convert event emitter to async iterator
Please comment with yours...
For a similar endeavor on a different project I made https://github.com/bustle/streaming-iterables we had a lot of stream based pipelines that needed replacing but itβs started getting used all over the place.
If you want to promisify something, please use promisify-es6 and not the Node.js built-in one to keep the bundle size small.
@vmx there are performance advantages in Node.js to using the builtin. Can we write a tiny intermediary library that tells webpack/browserify to replace the builtin with promisify-es6 and use that instead?
@mikeal Would be cool, but I think currently the bundle size matters more. If anyone finds the time, the polyfill could be extracted from the one webpack is using: https://github.com/defunctzombie/node-util/blob/fb06a0f973f0203762714dc16c19d4d0644d6fb0/util.js#L600
I've made PRs against a few repos that have not yet been captured in the table above:
- js-datastore-s3: ipfs/js-datastore-s3#17
- js-ipfs-block-service: ipfs/js-ipfs-block-service#85
- js-libp2p-bootstrap: libp2p/js-libp2p-bootstrap#89
- js-libp2p-kad-dht: libp2p/js-libp2p-kad-dht#82
- js-libp2p-mdns: libp2p/js-libp2p-mdns#78
- js-libp2p-record: libp2p/js-libp2p-record#13
I'm doing ipfs-blob-store: ipfs-shipyard/ipfs-blob-store#26
i like the pify
module
one usage is await pify(cb => oldThing(arg1,arg2,cb))()
not sure what promisify
module you're standardizing on but this style may also work
For the async refactors of larger modules, I recommend taking an iterative approach
I tried my hand at one here, please critique libp2p/js-libp2p-kad-dht#108
@kumavis ideally we'd tackle the list according to the priority so that no single module refactor is blocked by other module refactors and also to minimise the time a large refactor PR stays open.
We need to consider carefully any work we do to allow both implementations to exist at the same time since it is temporary (will be removed when everything switches), can have perf/maintenance implications and can be confusing to new contributors.
I appreciate it isn't always possible and the DHT is a tricky case bacause there's a lot of active development trying to land this feature. In hindsight the async await refactor here probably shouldn't have been attempted yet (considering it's priority is P3+).
@alanshaw thats what i like about my iterative non-breaking approach
- π‘ no breaking changes to public interface or behavior, only internal changes
- π οΈ wrap consumed apis in promisify/etc
- π₯ not blocked by dependencies
- π ready to merge now
- π¦ async refactor of large module can be done across a few PRs to make review easier and avoid conflicts with ongoing work
when the external API is ready to be moved to async, you're just removing the promisify/callbackify layer, and looks a little something like this
where is the ipld-selector
module listed in the chart? I can't find the repo and there is not a module on npm by that name
wanted to see the remaining red repos by priority, heres what the top ones currently look like
### P0
libp2p-connection-manager
libp2p-pubsub
libp2p-pnet
ipfs-multipart
### P1
libp2p-spdy
libp2p-utp
libp2p-webrtc-direct
ipfs-http-response
try/catch/finally felt awkward in some cases where callbacks were more flexible
for example when checking an abort signal and possibly ignoring an error
let res, queryError
try {
res = await this.path.queryFuncAsync(peer)
} catch (err) {
queryError = err
}
// Abort and ignore any error if we're no longer running
if (!this.running) {
return
}
if (queryError) {
this.run.errors.push(queryError)
return
}
// continue, using res
heres an alternative inspired by rust syntax
const [res, err] = await tryCatch(() => this.path.queryFuncAsync(peer))
// Abort and ignore any error if we're no longer running
if (!this.running) {
return
}
if (err) {
this.run.errors.push(err)
return
}
// continue, using res
where tryCatch
is
async function tryCatch (fn) {
try { return [ await fn() ] } catch (err) { return [undefined, err] }
}
You can use promises interface in these cases
I think itβs better to keep the JS idiomatic
let res
try {
res = await this.path.queryFuncAsync(peer)
} catch (err) {
if (!this.running) {
// Abort and ignore any error if we're no longer running
return
}
this.run.errors.push(err)
}
// continue, using res
Though itβs hard to see how you would continue using res in this case at it would be undefined, but Iβm missing context..
Hi y'all ππ½ AFAIU, this transition would something that would make everyone more productive and happy. Is there anything we can do to accelerate it? For example, could we have one week in which the team does nothing else other than shipping this refactor?
Yes, that would push it along a fair bit I reckon! It's taking so long because it's not top of anyone's priority list.
Totally agree π
When I started making async/await PRs I took advantage to fix bugs and make improvements to the code, but actually I think it just makes it more confusing for reviewers and slows the process down.
I noticed that @achingbrain followed a strategy of purely changing from callbacks to async/await. He would open issues for any bugs or improvements so they could be worked on separately, I think that's the best approach.
Just wanted to shout my appreciation and support to everyone pushing for this effort. I've went through yet another spelunking through the refactors and OHMY, so much effort here! Excited to see the result! β€οΈ
Thank you EVERYONE who contributed to this. It has been the biggest journey I've ever taken - you've all been with me the whole way and you're all incredible. There's still some bits to finish up but expect to see all this goodness in js-ipfs 0.41 soon.
I'll also write a blog post and try to note down some of the many wins this work has made and enabled for the future.
Once again β€οΈ to you all.