ipfs/js-ipfs

Awesome Endeavour: Async Iterators

alanshaw opened this issue Β· 47 comments

JS IPFS supports two types of stream at the API level, but uses pull streams for internals. If I was working on js-ipfs at the time I'd have made the same decision. Since then, async/await became part of the JS language and the majority of JavaScript runtimes now support async/await, async iterators and for/await/of (i.e. no need to transpile). These tools give us the power to stream data without needing to rely on a library.

Just because there are new language features available doesn't mean we should switch to using them. It's a significant upheaval to change the core interface spec and its implementations (js-ipfs, js-ipfs-api etc.) without good reason.

That said, it has become apparent that there are a growing number of good reasons to do this:

  • Reduction in bundle size - no need to bundle two different stream implementations, and their eco-system helper modules, no need for the async module.
  • Reduce npm install time - fewer dependencies to install.
  • Allows us to remove a bunch of plumbing code that converts Node.js streams to pull streams and vice versa.
  • Reduces API surface area, no addPullStream, addReadableStream.
  • Building an interface-ipfs-core compatible interface becomes a whole lot easier, no dual promise/callback API and no multiple stream implementation variations of the same function. It would also reduce the number of tests in the interface-ipfs-core test suite for the same reasons.
  • Node.js readable streams are now async iterators thanks to #17755!
  • Of note, it is trivial to convert from pull stream to (async) iterator and vice versa.
  • Unhandled throws that cannot be caught will no longer be a problem
  • Better stack traces, stacks no longer clipped at async boundaries, await stack traces better than promise stack traces
  • A modern, up to date and cutting edge API will aid community contributions and adoption.

The rough plan is:

  1. Drop support for dual callback/promise based APIs
  2. Expose only APIs that return promises or iterators for async actions
  3. Use async/await over then/catch when dealing with promises

This will require significant discussion and coordination from the JS teams. We'll need to reach agreement on the best API to expose for each module and manage releases carefully.

Below is a table documenting the multiformats, libp2p, ipld and ipfs modules that will likely need work. I suspect that some of these modules can be removed as they do not expose an async API. Likewise there's probably modules that got missed. If you notice either way then please edit the table or comment below.

If you'd like to own this enhancement task for a module then please comment below (or add yourself to the table if you know what you are doing). Please open a PR against the module asap (does not have to be anywhere near complete!) so we can add it here also and track progress.

  • 🍎 = Not started
  • 🍊 = In progress
  • 🍏 = Complete

Core

Multiformats

  • Current progress:
  • 🍏 2/2
  • 🍊 0/2
  • 🍎 0/2
Module PR Owner Status Priority
multihashing-async multiformats/js-multihashing-async#37 @hugomrdias 🍏 P0
multistream-select https://github.com/multiformats/js-multistream-select/releases/tag/v0.15.0 @alanshaw 🍏 P1

libp2p

libp2p/js-libp2p#266

  • Current progress:
  • 🍏 25/30
  • 🍊 3/30
  • 🍎 2/30
Module PR Owner Status Priority
peer-id libp2p/js-peer-id#87 @hacdias 🍏 P3+
peer-info libp2p/js-peer-info#67 @hacdias 🍏 P3+
libp2p https://github.com/libp2p/js-libp2p/tree/refactor/async-await @jacobheun 🍊 P0
libp2p-bootstrap libp2p/js-libp2p-bootstrap#89 @dirkmc 🍏 P3+
libp2p-crypto libp2p/js-libp2p-crypto#131 @alanshaw 🍏 P3+
libp2p-crypto-secp256k1 libp2p/js-libp2p-crypto-secp256k1#9 @alanshaw 🍏 P1
libp2p-keychain libp2p/js-libp2p-keychain#37 @jacobheun 🍏 P3+
libp2p-kad-dht libp2p/js-libp2p-kad-dht#148 libp2p/js-libp2p-kad-dht#153 @vasco-santos 🍏 P3+
libp2p-delegated-content-routing libp2p/js-libp2p-delegated-content-routing#7 @achingbrain 🍏 P3+
libp2p-delegated-peer-routing libp2p/js-libp2p-delegated-peer-routing#8 @achingbrain 🍏 P3+
libp2p-floodsub libp2p/js-libp2p-floodsub#88 @vasco-santos 🍏 P3+
libp2p-gossipsub ChainSafe/js-libp2p-gossipsub#49 @vasco-santos 🍏 P3+
libp2p-pubsub libp2p/js-libp2p-pubsub#26 @vasco-santos 🍏 P0
libp2p-secio libp2p/js-libp2p-secio#108 @mkg20001 🍏 P3+
libp2p-spdy 🍎 P1
libp2p-mplex libp2p/js-libp2p-mplex#94 @alanshaw 🍏 P1
libp2p-record libp2p/js-libp2p-record#13 @dirkmc 🍏 P1
libp2p-rendezvous 🍎 P3+
libp2p-mdns libp2p/js-libp2p-mdns#78 @dirkmc 🍏 P3+
libp2p-tcp libp2p/js-libp2p-tcp#112 @dirkmc + @alanshaw 🍏 P3+
libp2p-utp libp2p/js-libp2p-utp#81 @vasco-santos 🍊 P1
libp2p-webrtc-direct libp2p/js-libp2p-webrtc-direct#30 @vasco-santos 🍏 P1
libp2p-webrtc-star libp2p/js-libp2p-webrtc-star#183 @vasco-santos 🍏 P3+
libp2p-websocket-star libp2p/js-libp2p-websocket-star#77 @vasco-santos 🍊 P3+
libp2p-websocket-star-rendezvous libp2p/js-libp2p-websocket-star-rendezvous#35 @achingbrain 🍏 P3+
libp2p-websockets libp2p/js-libp2p-websockets#92 @alanshaw + @vasco-santos 🍏 P1
interface-connection libp2p/interface-connection#29 @vasco-santos 🍏 P0
interface-transport libp2p/interface-transport#44 @dirkmc 🍏 P0
interface-peer-discovery libp2p/interface-peer-discovery#10 @vasco-santos 🍏 P0
interface-stream-muxer libp2p/interface-stream-muxer#55 @alanshaw 🍏 P3+

IPLD

Please read #1670 (comment) before contributing.

  • Current progress:
  • 🍏 8/8
  • 🍊 0/8
  • 🍎 0/8
Module PR Owner Status Priority
ipld ipld/js-ipld#190 @vmx 🍏 P3+
ipld-bitcoin ipld/js-ipld-bitcoin#48 @vmx 🍏 P3+
ipld-dag-pb ipld/js-ipld-dag-pb#124 @vmx 🍏 P1
ipld-dag-cbor ipld/js-ipld-dag-cbor#107 @vmx 🍏 P1
ipld-ethereum ipld/js-ipld-ethereum#51 @vmx 🍏 P1
ipld-git ipld/js-ipld-git#51 @vmx 🍏 P1
ipld-raw ipld/js-ipld-raw#32 @vmx 🍏 P1
ipld-zcash ipld/js-ipld-zcash#39 @vmx 🍏 P1

IPFS

  • Current progress:
  • 🍏 19/19
  • 🍊 0/19
  • 🍎 0/19
Module PR Owner Status Priority
interface-datastore ipfs/interface-datastore#25 @alanshaw 🍏 P0
datastore-core ipfs/js-datastore-core#17 @zcstarr 🍏 P1
datastore-fs ipfs/js-datastore-fs#22 @zcstarr 🍏 P1
datastore-level ipfs/js-datastore-level#12 @alanshaw 🍏 P2
datastore-pubsub ipfs/js-datastore-pubsub#15 @vasco-santos 🍏 P3+
datastore-s3 ipfs/js-datastore-s3#17 @dirkmc 🍏 P2
interface-ipfs-core ipfs-inactive/interface-js-ipfs-core#562 @PedroMiguelSS 🍏 P3+
interop ipfs/interop#87 @PedroMiguelSS 🍏 P3+
ipfs #2495 #2379 #2517 #2547 #2658 #2659 #2660 #2661 #2664 #2665 #2666 #2668 #2671 #2672 #2683 #2724 #2726 @alanshaw / @achingbrain 🍏 P3+
ipns ipfs/js-ipns#19 @dirkmc 🍏 P3+
ipfs-http-client ipfs-inactive/js-ipfs-http-client#1059 ipfs-inactive/js-ipfs-http-client#1087 ipfs-inactive/js-ipfs-http-client#1101 ipfs-inactive/js-ipfs-http-client#1124 ipfs-inactive/js-ipfs-http-client#1149 ipfs-inactive/js-ipfs-http-client#1153 ipfs-inactive/js-ipfs-http-client#1154 ipfs-inactive/js-ipfs-http-client#1155 ipfs-inactive/js-ipfs-http-client#1156 ipfs-inactive/js-ipfs-http-client#1157 ipfs-inactive/js-ipfs-http-client#1161 ipfs-inactive/js-ipfs-http-client#1160 ipfs-inactive/js-ipfs-http-client#1164 ipfs-inactive/js-ipfs-http-client#1165 ipfs-inactive/js-ipfs-http-client#1166 ipfs-inactive/js-ipfs-http-client#1168 ipfs-inactive/js-ipfs-http-client#1169 ipfs-inactive/js-ipfs-http-client#1170 ipfs-inactive/js-ipfs-http-client#1172 ipfs-inactive/js-ipfs-http-client#1173 ipfs-inactive/js-ipfs-http-client#1174 ipfs-inactive/js-ipfs-http-client#1183 @alanshaw / @achingbrain 🍏 P3+
ipfs-bitswap ipfs/js-ipfs-bitswap#202 @achingbrain 🍏 P3+
ipfs-block-service ipfs/js-ipfs-block-service#85 @dirkmc 🍏 P3+
ipfs-http-response ipfs/js-ipfs-http-response#28 @PedroMiguelSS 🍏 P1
ipfs-mfs ipfs-inactive/js-ipfs-mfs#49 @achingbrain 🍏 P3+
ipfs-multipart ipfs-inactive/js-ipfs-multipart#17 @achingbrain 🍏 P0
ipfs-repo ipfs/js-ipfs-repo#189 @zcstarr 🍏 P3+
ipfs-unixfs-exporter ipfs-inactive/js-ipfs-unixfs-exporter#21 @achingbrain 🍏 P3+
ipfs-unixfs-importer ipfs-inactive/js-ipfs-unixfs-importer#24 @achingbrain 🍏 P3+

Dependents

These modules use IPFS and fall under the ipfs/ipfs-shipyard umbrella so should also be updated.

Current progress:

  • 🍏 1/44
  • 🍊 2/44
  • 🍎 40/44
Module PR Owner Status Priority
awesome-ipfs 🍎
benchmark-js.ipfs.io 🍎
cid-utils-website 🍎
demo-ipfs-todo 🍎
ipfs-companion @lidel 🍎
ipfs-desktop 🍎
ipfs-fuse @alanshaw 🍎
ipfs-geoip ipfs-shipyard/ipfs-geoip#67 @nijynot 🍊 P0
ipfs-iiif-db 🍎
ipfs-level 🍎
ipfs-blob-store ipfs-shipyard/ipfs-blob-store#26 @niinpatel 🍊 P3+
ipfs-locations 🍎
ipfs-performance-profiling 🍎
ipfs-pubsub-peer-monitor 🍎
ipfs-pubsub-room 🍎
ipfs-pubsub-room-demo 🍎
ipfs-redux-bundle 🍎
ipfs-registry-mirror 🍎
ipfs-postmsg-proxy @alanshaw 🍎
ipfs-pubsub-1on1 🍎
ipfs-service-worker @vasco-santos 🍎
ipfs-share-files 🍎
ipfs-stats 🍎
ipfs-webui 🍎
ipfsd-ctl ipfs/js-ipfsd-ctl#353 @achingbrain 🍏
ipld-explorer 🍎
ipld-explorer-cli @alanshaw 🍎
ipld-explorer-components 🍎
ipscend 🍎
npm-on-ipfs @achingbrain 🍎
peer-crdt-textarea-binding 🍎
peer-flipchart 🍎
peer-pad-core 🍎
peer-star-app 🍎
peer-star-network-vis 🍎
peer-star-network-vis-react 🍎
peer-star-peer-color 🍎
peer-star-react 🍎
peerpad-peer-crdt 🍎
service-worker-gateway 🍎
tevere 🍎
window.ipfs-fallback @alanshaw 🍎
y-ipfs-connector 🍎
ipld-graph-builder 🍎 P3+

This is a "Everyone is Welcome" effort and a excellent opportunity for everyone to get their feet wet on the codebase, improve documentation, testing and refresh the code.

Hoping that we can count with help from the IPFS GUI, IPFS in Web Browsers, Community WG and Dynamic Data & Capabilities. This will make our JS APIs better for all the modern/new JS developers!

@ipfs/javascript-team Unite βœ¨πŸ‘‹πŸ½βš‘οΈ

this is awesome and makes me so happy to see πŸŽ‰

add me to multihashing-async, ipfsd-ctl, ipfs-multipart

@hugomrdias you got it! :)

Updated the table with my assignements

@alanshaw I'm taking a stab at libp2p/js-peer-id#87 πŸ˜„

Update: I added myself to peer-info too.

@alanshaw I was thinking at taking a look at ipld-* and I noticed interface-ipld-format is missing from the table. Despite not being code, it is also required to be updated. I'll take a look at that one, does it make sense?

RE: IPLD

We (@vmx and I) aren't too happy with interface-ipld-format and have been talking about a re-write for a some time now. Originally we had intended to use async functions as a test case "before advocating that all of js-ipfs moves to it" but you're beating us to it now :)

Perhaps we should accelerate the timeline for this refactor rather than trying to update the current interface?

@mikeal it would be great if you, @vmx and @hacdias could work together on that refactor/redesign. I is a great opportunity for @hacdias to learn the IPLD way and learn from you two :)

Yeah, that would be awesome. @mikeal @vmx as you can see by my previous comment, I already created a PR on interface-ipld-format. Maybe we could continue the discussion there!

it would be great if you, @vmx and @hacdias could work together on that refactor/redesign

Sounds great, but I do want to make sure I'm prioritizing things in the right order (there's a lot on my plate from being out on vacation and now I'm sick). Is this refactor already blocking downstream refactors?

@mikeal as far as I know, it isn't since no one started refactoring IPLD nor anything that depends on it.

Just a note that we should use this opportunity to remove the object API as per the plan here ipfs-inactive/interface-js-ipfs-core#388 (comment)

If we're removing redundant/superceded APIs, there are some more suggestions here: https://github.com/ipfs-shipyard/ipfs-http/issues

i'll pick up ipfs-repo

Hi I'm new πŸ‘‹ I'll pickup ipld-git ipld/js-ipld-git#38

I do have a question. The entire library is sync with setImmediates for most callbacks. With async/await you are free to await on a value that isn't a promise with no ill effect. Should ipld-git return promises to keep a spec? or will consumers of the ipld-* interface be using async/await and therefor tolerate a non promise values?

Thanks!

@reconbot awesome! The IPLD refactor is going to be a bigger effort than a simple switch to async/await and any PRs you make should be done in the context of ipld/js-ipld#185. @vmx is the champion for JS IPLD. @vmx, do you have a moment to focus @reconbot's energies in the right places?!

vmx commented

As @alanshaw mentions, there's bigger changes in the IPLD API (ipld/js-ipld#185) and IPLD Formats API (ipld/interface-ipld-format#50) upcoming. I'm actively working on those. Once the new IPLD API is there (I'm working on it), we can tackle the IPLD Formats one.

To all contributors. Please hold on for a moment and don't spend time on making the IPLD Formats Promise/async-await based. I'll post an update here, once it makes sense to work on it.

No worries it was worth it for me to dive in and figure it out. All things considered it was a pretty small straightforward change. And I figured there’s a huge unwritten dependency tree to the rewriting process.

I was looking at some of the libp2p stuff next but I imagine it’s got a similar api redesign going on?

We'll be evaluating the js-libp2p api itself, and likely js-libp2p-switch as a result of that. There are some of the interface modules that are a blocker for the transports that we should be able to get done though. Any potential api changes there shouldn't be a blocker for moving them to async. In particular: interface-connection, interface-stream-muxer, and interface-transport. They weren't previously in the table, so I added them. Those would help free up work for many of the other libp2p modules.

Hi, just refactored ipfs-geoip (ipfs-shipyard/ipfs-geoip#67).
I'll try to take on ipfs-bitswap too if that's fine.

@nijynot awesome thank you πŸ™

I'll put you down for bitswap πŸš€

vmx commented

For all working on this endeavour. In case you refactor parts which is still using the async module, e.g. a waterfall, you need to be careful.

I thought I can replace a call like

(files, callback) => ipld.get(…, callback),

with (where first() returns a Promise):

async (files) => ipld.get(…).first(),

This works in Node.js, but it won't work with our Browser build as we transpile to older ES versions.

Instead you need to resolve the Promise like this:

(files, callback) => ipld.get(…).first().then(
  (node) => callback(null, node),
  (error) => callback(error)
)

Update: You can also use asyncify (see https://caolan.github.io/async/global.html for more information):

const asyncify = require('async/asyncify')
…
asyncify(async (files) => ipld.get(…).first()),

If you want to know if your current code works in the Browser you need to run the tests in production mode:

NODE_ENV=production npm run test

(this needs ipfs/aegir#325 to be fixed).

Credit for finding this issue goes to @achingbrain as he mentioned the transpiling issue during code review.

Please make an issue in aegir about this, with a generic repro and pointing to terser as the problem. This is probably fixable.
BTW next release aegir will have minification in the karma tests.

Following is a list of modules that I created (that probably don't exist in on npm otherwise) and needed in pursuit of this endeavour. They might be useful to you too 😁:

...and some that I found that I didn't have to write that were useful:

Please comment with yours...

For a similar endeavor on a different project I made https://github.com/bustle/streaming-iterables we had a lot of stream based pipelines that needed replacing but it’s started getting used all over the place.

vmx commented

If you want to promisify something, please use promisify-es6 and not the Node.js built-in one to keep the bundle size small.

@vmx there are performance advantages in Node.js to using the builtin. Can we write a tiny intermediary library that tells webpack/browserify to replace the builtin with promisify-es6 and use that instead?

vmx commented

@mikeal Would be cool, but I think currently the bundle size matters more. If anyone finds the time, the polyfill could be extracted from the one webpack is using: https://github.com/defunctzombie/node-util/blob/fb06a0f973f0203762714dc16c19d4d0644d6fb0/util.js#L600

I've made PRs against a few repos that have not yet been captured in the table above:

I'm doing ipfs-blob-store: ipfs-shipyard/ipfs-blob-store#26

i like the pify module
one usage is await pify(cb => oldThing(arg1,arg2,cb))()
not sure what promisify module you're standardizing on but this style may also work

For the async refactors of larger modules, I recommend taking an iterative approach

I tried my hand at one here, please critique libp2p/js-libp2p-kad-dht#108

@kumavis ideally we'd tackle the list according to the priority so that no single module refactor is blocked by other module refactors and also to minimise the time a large refactor PR stays open.

We need to consider carefully any work we do to allow both implementations to exist at the same time since it is temporary (will be removed when everything switches), can have perf/maintenance implications and can be confusing to new contributors.

I appreciate it isn't always possible and the DHT is a tricky case bacause there's a lot of active development trying to land this feature. In hindsight the async await refactor here probably shouldn't have been attempted yet (considering it's priority is P3+).

@alanshaw thats what i like about my iterative non-breaking approach

  • 🏑 no breaking changes to public interface or behavior, only internal changes
  • πŸ› οΈ wrap consumed apis in promisify/etc
  • πŸ₯… not blocked by dependencies
  • πŸŽ‰ ready to merge now
  • πŸ“¦ async refactor of large module can be done across a few PRs to make review easier and avoid conflicts with ongoing work

when the external API is ready to be moved to async, you're just removing the promisify/callbackify layer, and looks a little something like this

where is the ipld-selector module listed in the chart? I can't find the repo and there is not a module on npm by that name

wanted to see the remaining red repos by priority, heres what the top ones currently look like

### P0
libp2p-connection-manager
libp2p-pubsub
libp2p-pnet
ipfs-multipart
### P1
libp2p-spdy
libp2p-utp
libp2p-webrtc-direct
ipfs-http-response

try/catch/finally felt awkward in some cases where callbacks were more flexible

for example when checking an abort signal and possibly ignoring an error

let res, queryError
try {
  res = await this.path.queryFuncAsync(peer)
} catch (err) {
  queryError = err
} 

// Abort and ignore any error if we're no longer running
if (!this.running) {
  return
}

if (queryError) {
  this.run.errors.push(queryError)
  return
}

// continue, using res

heres an alternative inspired by rust syntax

const [res, err] = await tryCatch(() => this.path.queryFuncAsync(peer))

// Abort and ignore any error if we're no longer running
if (!this.running) {
  return
}

if (err) {
  this.run.errors.push(err)
  return
}

// continue, using res

where tryCatch is

async function tryCatch (fn) {
  try { return [ await fn() ] } catch (err) { return [undefined, err] }
} 

You can use promises interface in these cases

I think it’s better to keep the JS idiomatic

let res

try {
  res = await this.path.queryFuncAsync(peer)
} catch (err) {
  if (!this.running) {
    // Abort and ignore any error if we're no longer running
    return
  }

  this.run.errors.push(err)
}

// continue, using res

Though it’s hard to see how you would continue using res in this case at it would be undefined, but I’m missing context..

Hi y'all πŸ‘‹πŸ½ AFAIU, this transition would something that would make everyone more productive and happy. Is there anything we can do to accelerate it? For example, could we have one week in which the team does nothing else other than shipping this refactor?

Yes, that would push it along a fair bit I reckon! It's taking so long because it's not top of anyone's priority list.

Totally agree πŸ‘

When I started making async/await PRs I took advantage to fix bugs and make improvements to the code, but actually I think it just makes it more confusing for reviewers and slows the process down.

I noticed that @achingbrain followed a strategy of purely changing from callbacks to async/await. He would open issues for any bugs or improvements so they could be worked on separately, I think that's the best approach.

Just wanted to shout my appreciation and support to everyone pushing for this effort. I've went through yet another spelunking through the refactors and OHMY, so much effort here! Excited to see the result! ❀️

Thank you EVERYONE who contributed to this. It has been the biggest journey I've ever taken - you've all been with me the whole way and you're all incredible. There's still some bits to finish up but expect to see all this goodness in js-ipfs 0.41 soon.

I'll also write a blog post and try to note down some of the many wins this work has made and enabled for the future.

Once again ❀️ to you all.