Promise Performance Meeting
benjamingr opened this issue Β· 33 comments
I would like to set up a hangout with the V8 team regarding promise performance. I've already talked to @bmeurer the person in charge on V8's side regarding a meeting and he has been gracious enough to agree to set up one.
by all means, if discussion here gets noisy feel free to unsubscribe - we will discuss this in the hangout
Outstanding issues I'd like to discuss
- Any hooks we need in the microtask queue (for example for the VM issue) and how they might impact performance.
- The overhead of creation and how we can make
util.promisify
faster to a point where native promises win against Bluebird 3.5 in the doxbee benchmark. - Common use cases we have gathered from the thousand+ questions in the promise tag in StackOverflow and their performance profile.
- Performance cliffs and gotchas that relate to promise code "in the wild" and how we can deliver a better experience to users.
Outstanding issues that are interesting, but not necessarily for the meeting
- The promises/post-mortem issue that's been blocking integration and whether or not we might be able to get help from the V8 side.
- ETAs and stability notes on new promise features (async iterators, promise#finally etc) so we can include those in the relevant APIs (we're mostly prepared though).
- Async iterator performance profiles (basically, async iterators are a little useless in performance based scenarios if the iteration is not inlined and optimized somehow because of the overhead of a promise + iterator object per call).
Participation
In addition to nodejs/promises members - pinging @petkaantonov @nodejs/performance, @addaleax, @caitp and @misterdjules.
I'd like to brainstorm here for a few days and then set up the hangout. I'd like to limit participation to a few people - I'd love to attend to share the cliffs and bluebird experience, I'd love @addaleax and @mcollina to attend from Node's side and @bmeurer obviously :) If anyone would like to attend because they don't feel their view point will be represented please comment here with why you would like to attend.
Optimally most discussion before the meeting should happen here.
I do not think we should care for the bluebird benchmark at all, mostly because of this: petkaantonov/bluebird#887 (comment).
We probably need new benchmarks that are independent from the ones in bluebird - IMHO they are built to make bluebird shine.
More importantly, I think the best approach is to focus on async/await benchmarking and performance. This is what people will use, and one of the main drivers for adopting native promises in modules and applications.
I'm +1 in joining, typically CEST business hours works well for me, but I can also join later in the evening.
We probably need new benchmarks that are independent from the ones in bluebird - IMHO they are built to make bluebird shine.
Actually, the benchmarks predate bluebird (so it's more accurate to say that Bluebird is built to make the benchmarks shine). Bluebird is not at the best maintenance state after 4 years but it is the benchmark that V8 is using internally for promise performance. I didn't suggest them because they are terrific. We can also extract the benchmark from https://github.com/petkaantonov/bluebird/tree/master/benchmark/doxbee-sequential or approach @spion directly about the suite - I assume V8 did the former.
I'm also +1 on writing better applicative promise benchmarks that simulate real world loads. I think we'll discover quickly that our bottleneck is still util.promisify
and the fact that V8 schedules then
on the microtask queue where bluebird doesn't if it can avoid it (by tagging the code).
I think V8 can definitely win vs. bluebird with TurboFan inlining promise calls and async/await based optimizations (I'm sure TurboFan, unlike Crankshaft is fully capable of that level of optimization after reading the optimization code). Moreover, the V8 team are willing to listen to Node.js performance concerns now (which is awesome). I think async/await can win against callbacks (as a long term goal) but that's a farther point.
Actually, the benchmarks predate bluebird (so it's more accurate to say that Bluebird is built to make the benchmarks shine). Bluebird is not at the best maintenance state after 4 years but it is the benchmark that V8 is using internally for promise performance. I didn't suggest them because they are terrific. We can also extract the benchmark from https://github.com/petkaantonov/bluebird/tree/master/benchmark/doxbee-sequential or approach @spion directly about the suite - I assume V8 did the former.
I'm ok if we move them/extract them to another repo, the fact that they currently live within bluebird makes them not fair.
I'm also +1 on writing better applicative promise benchmarks that simulate real world loads.
Definitely π .
I think we'll discover quickly that our bottleneck is still util.promisify
util.promisify
is only relevant to convert callback APIs. However most of the CPU time in applications is spent in business logic or modules/frameworks. I would expect some of them to be natively written with async/await
. How does that compare? What's the different between writing native async/await
code and using util.promisify
?
and the fact that V8 schedules then on the microtask queue where bluebird doesn't if it can avoid it (by tagging the code).
But that's it not spec-compliant, right?
I think V8 can definitely win vs. bluebird with TurboFan inlining promise calls and async/await based optimizations (I'm sure TurboFan, unlike Crankshaft is fully capable of that level of optimization after reading the optimization code).
I can confirm this. It's very impressive and it took me some time to understand why some async/await code was faster than the equivalent callback-based.
I would expect some of them to be natively written with async/await. How does that compare? What's the different between writing native async/await code and using
util.promisify
?
Well that's a good point. Today our entire Node.js platform is using callbacks, for people using a promisified Node.js core (with util.promisify) the bottleneck of util.promisify
is real. At Peer5 for example we have hundreds of thousands of promises in flight at once very commonly and the overhead of calling every single I/O method being a little bit slower can be significant.
Once we deliver a promisified core API a faster util.promisify
won't be a big priority. I think that there is still a lot low hanging fruit that can be picked to make native promises faster than bluebird without discussing promisified core.
But that's it not spec-compliant, right?
Sure it is, I'll give an example:
new Promise(r => setTimeout(r, 1)).then(x => console.log("Got here"))
In this case, bluebird sees that new Promise
did not resolve synchronously and thus does not defer the console.log to the next tick (it was already deferred). It also does this optimization when promisifying.
By comparison Node or V8 makes no such assumptions and always defers .then
callbacks to the microtick queue even if they can run synchronously (since we know they run after "only platform code" remained on the stack as required).
Scheduling everything on nextTick can have measurable and perhaps even considerable overhead. Although I think promisification is what dominates the performance charactaristics of benchmarks that make hundreds of thousands of calls.
I'd like promises to be a 0 overhead abstraction in Node, Bluebird is close - but I think native promises can win here and I'm rooting for them. With the help of Benedikt I think that's a very reasonable goal.
Other than creation of promises and a few optimization tricks - I think V8 promises can be very competitive. I'm looking for additional pain points to optimize for before the meeting - any input (from you in particular) would be very useful here.
Has anyone tried to benchmark the current implementation of util.promisify
vs. one written with the functions provided by V8 extras?
Once we deliver a promisified core API a faster util.promisify won't be a big priority. I think that there is still a lot low hanging fruit that can be picked to make native promises faster than bluebird without discussing promisified core.
I agree.
By comparison Node or V8 makes no such assumptions and always defers .then callbacks to the microtick queue even if they can run synchronously (since we know they run after "only platform code" remained on the stack as required).
Can V8 implement the same behavior?
@targos I'm unfamiliar with that approach - if @addaleax chimes in she might have done it. I benchmarked util.promisify
vs a userland promisify and it was faster measurably on my computer. The benchmarks are here.
With util.promisify
:
promises-ecmascript6-native.js 664 60.14
And without util.promisify
:
promises-ecmascript6-native.js 879 63.53
One nice side effect is that promises beat caolan/async with Node at the moment. Those benchmarks are on older Node. I'll run them again and compare (If you create a fork with a promisify based on another approach I can check that out).
Will run the tests with Node 8.4 and post the results here.
Can V8 implement the same behavior?
I don't see why not, they control the promise constructor and the then
and can provide a hook for promisify. I'd like to gather a bunch of possible optimizations we do at Bluebird for the call and present them to the V8 team.
By the way, how do you feel about forking Bluebird under the Node org (possibly in nodejs/promises), removing the excess code and making a benchmark here where we can maintain it? I hope you realize Petka didn't close that PR because he was doing anything fishy - he is just a very busy individual without a lot of time to maintain the none-critical parts. We're all Node collaborators (me you and Petka) with the same goal (faster core promises).
I created two versions of native promises (one with util.promisify
and one without) and removed everything but bluebird and native promises:
file time(ms) memory(MB)
callbacks-baseline.js 162 32.80
promises-bluebird-generator.js 257 40.44
promises-bluebird.js 316 45.18
promises-ecmascript6-native-promisify.js 538 63.64
promises-ecmascript6-native-base.js 659 109.70
promises-ecmascript6-asyncawait.js 772 135.84
Platform info:
Darwin 16.7.0 x64
Node.JS 8.4.0
V8 6.0.286.52
Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz Γ 8
I'm trying to figure out what's the best place to upload said benchmark. It simulates doing several IO operations and committing to a database. Going to check out node-v8
on V8 6.2.248 next and see what numbers I get on it.
Same hardware - running the latest canary:
file time(ms) memory(MB)
callbacks-baseline.js 152 28.04
promises-bluebird-generator.js 235 40.29
promises-bluebird.js 283 48.36
promises-ecmascript6-native-base.js 432 98.27
promises-ecmascript6-native-promisify.js 482 65.26
promises-ecmascript6-asyncawait.js 500 119.54
Platform info:
Darwin 16.7.0 x64
Node.JS 9.0.0-pre
V8 6.2.248
Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz Γ 8
Looks like everything got faster, but native promisify is actually faster than util.promisify
, which is a very surprising result. It looks like the memory consumption of native promises in particular got a lot better.
As a comparison, here is 8.1 on the same hardware (still running TF on the async/await one):
file time(ms) memory(MB)
callbacks-baseline.js 138 30.59
promises-bluebird-generator.js 269 41.82
promises-bluebird.js 306 47.99
promises-ecmascript6-native-promisify.js 577 68.37
promises-ecmascript6-native-base.js 642 98.47
promises-ecmascript6-asyncawait.js 755 117.48
Platform info:
Darwin 16.7.0 x64
Node.JS 8.1.3
V8 5.8.283.41
Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz Γ 8
Numbers on Windows:
file time(ms) memory(MB)
callbacks-baseline.js 149 33.54
promises-bluebird-generator.js 306 41.39
promises-bluebird.js 378 45.06
async-bluebird.js 523 73.09
promises-es2015-util.promisify.js 598 68.02
async-es2017-util.promisify.js 604 77.70
promises-es2015-native.js 651 110.50
async-es2017-native.js 653 109.88
Platform info:
Windows_NT 10.0.15063 x64
Node.JS 8.4.0
V8 6.0.286.52
Intel(R) Core(TM) i7-3667U CPU @ 2.00GHz Γ 4
Source: https://github.com/kyrylkov/promise-async-performance
@kyrylkov thanks! Given I did not share the code the fact we got similar results is reassuring. For comparison and until we decide where to put it - here is the patch I applied https://gist.github.com/benjamingr/a7c7cd7ff2ee69d3a71f0058d286ff9a#file-1-diff-L617-L675
@benjamingr Here is the code https://github.com/kyrylkov/promise-async-performance
@benjamingr you can try with this branch: https://github.com/targos/node/commits/expose-v8-extras
I checked with a very simplistic benchmark and the V8 extras seem to bring a non-negligible improvement:
'use strict';
function asyncF(cb) {
cb();
}
const {promisify} = require('util');
const promiseF = promisify(asyncF);
(async function() {
for (var i = 0; i < 1e6; i++) await promiseF();
console.time('test');
for (var i = 0; i < 1e6; i++) await promiseF();
console.timeEnd('test');
})();
Results:
- 8.4.0: 804.281ms
- master: 805.989ms
- master with V8 extras: 654.518ms
Thanks @targos, that's very interesting. Running those on the real (doxbee) benchmark after building Node from your source at that branch I get:
results for 10000 parallel executions, 1 ms per I/O op
file time(ms) memory(MB)
callbacks-baseline.js 178 35.48
promises-bluebird-generator.js 271 42.20
promises-bluebird.js 332 47.52
promises-ecmascript6-native-promisify.js 572 69.91
promises-ecmascript6-native-base.js 603 94.12
promises-bluebird-asyncawait.js 645 104.20
promises-ecmascript6-asyncawait.js 677 103.06
Platform info:
Darwin 16.7.0 x64
Node.JS 9.0.0-pre
V8 6.0.286.52
Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz Γ 8
Looks like on V8 6.0 native promisify from v8-extras wins (PR welcome).
Here are the results from your branch merged into node-v8 canary (so on top of v8 6.2):
results for 10000 parallel executions, 1 ms per I/O op
file time(ms) memory(MB)
callbacks-baseline.js 148 27.93
promises-bluebird-generator.js 236 40.37
promises-bluebird.js 290 48.64
promises-bluebird-asyncawait.js 416 89.74
promises-ecmascript6-native-base.js 474 94.57
promises-ecmascript6-asyncawait.js 493 96.20
promises-ecmascript6-native-promisify.js 512 73.68
Platform info:
Darwin 16.7.0 x64
Node.JS 9.0.0-pre
V8 6.2.248
Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz Γ 8
@kyrylkov thanks! That's very useful, it's also interesting to benchmark bluebird promises with native async/await (interesting result, faster than native async/await)
@benjamingr not sure I'm following. My branch is on V8 6.0, not 6.2.
@targos updated the above with benchmarks both with V8 6.0 and 6.2 (6.2 by doing a git pull from your branch on top of node-v8's canary branch).
regarding async iterators, I would like to get the best performant implementation for streams. Maybe we should talk about that as well.
I'm happy to attend the meeting as well. I'm in PST though. Leaving some inline comments if I can't make it --
- Any hooks we need in the microtask queue (for example for the VM issue) and how they might impact performance.
@matthewloring did some preliminary micro benchmarking and found looping over Promise.resolve(42);
to be 10x slower compared to not using PromiseHooks, but fixed it with https://codereview.chromium.org/2889863002/ (I don't know if node was updated to use this). I believe this brought down the regression to 2-3x. This is expected as we call out to the runtime for each promise hook.
- The overhead of creation and how we can make util.promisify faster to a point where native promises win against Bluebird 3.5 in the doxbee benchmark.
util.promisify is implemented using the C++ API which is probably causing most of this overhead. The best way to address this would be for V8 to implement util.promisify using our assembler and expose it using v8-extras.
- Performance cliffs and gotchas that relate to promise code "in the wild" and how we can deliver a better experience to users.
The biggest perf cliff is modifying the promise prototype or a promise instance. This would cause all our fast paths to fail since it's guarded by a map check. (This is true for most builtins in V8).
- ETAs and stability notes on new promise features (async iterators, promise#finally etc) so we can include those in the relevant APIs (we're mostly prepared though).
The general policy is to implement stage 3 proposals, but we don't have a specific timeline for each proposal. I'm currently working on updating V8's implementation of promise#finally to match the recent spec changes. @caitp has been working hard on async iterators; will defer to them about it's status.
There are some more optimizations possible for native promises both for perf and memory (such as https://bugs.chromium.org/p/v8/issues/detail?id=6468). But I agree with @mcollina that it might be better to focus on async await and optimize promises as part of that process.
As an aside, native promises have to spec compatible which requires doing additional work (like looking up the species constructor and maintaining strict ordering in the microtaskq) that polyfills can get away it. Here's another microbenchmark by wikipedia that doesn't use util.promisify (but suffers from other problems of microbenchmarking) - https://github.com/wikimedia/web-stream-util/pull/3
The general policy is to implement stage 3 proposals, but we don't have a specific timeline for each proposal. I'm currently working on updating V8's implementation of promise#finally to match the recent spec changes. @caitp has been working hard on async iterators; will defer to them about it's status.
I'm not going to be attending a meeting, but here's a quick update on how performance looks right now:
generator.next() / .throw() / .return()
- Allocate a Promise and AsyncGeneratorRequest separately (could be combined into single allocation, which is not wasteful in the common case --- but that's a very minor improvement)
- O(N) algorithm to append request to queue of requests
- Could be O(1), but there are some downsides to this
- resuming with .return() essentially performs
await <sent value>
. This no longer results in increased bytecode size in v8 (as of https://chromium-review.googlesource.com/c/582218), but still results in the allocation of 2 closures and 2 Promises, and a call into C++ to figure out catch prediction (though this only matters for debugging), which takes a bit of extra time regardless of whether the debugger is enabled or not.
yield <operand>
- allocates 2 closures and 2 promises. We could add another fast case to PromiseHandle to prevent needing to closures. Getting rid of the Promise allocations is more complicated.
return <operand>
- Awaits the operand, allocating 2x closures and 2x promises again, and emits all the extra bytecode that a syntactic
await
normally emits.
yield* <operand>
- Emits quite a lot of bytecode. Spec requires that
innerResult
is awaited, andinnerResult.value
is also awaited. Each of those awaits incurs allocation of 2x closures and 2x Promises, which is a fair bit. The amount of bytecode generated is not optimal, emitting bytecode for 2 full Awaits, quite a few bit more bytes than a sync generator'syield*
would be
await <operand>
- Again, the 2x closure and 2x Promises.
So, async generators allocate a lot of objects during their runtime. In many cases, these don't migrate to old space, but it's a stress on the GC to track them. Closures may be possible to eliminate with special handlers in the PromiseHandle
, but it's hard to eliminate or sink allocation of the Promise objects (I don't have ideas about this yet).
Other than those issues, async generators share a lot of the same issues that native Promises share.
these are not based on measurements/benchmarks of any kind, as I haven't found any suitable benchmarks or applications which make use of async generators just yet. A real application which uses them would help a lot to identify problems --- All of the performance work so far has really been based on reducing generated code and snapshot size, and not on actual application performance
cc @psmarshall
Thanks for chiming in! I think a nice design goal would be to optimize the 99% use case of using native promises only listened to by one observer in an async
function - and promise creation. I think the benchmark is pretty representative and you can check the current results with 5.8, 6.0 and 6.2 with and without util.promisify
here and with v8-extras here.
util.promisify is implemented using the C++ API which is probably causing most of this overhead. The best way to address this would be for V8 to implement util.promisify using our assembler and expose it using v8-extras.
That would be amazing, a faster util.promisify
in core is definitely something that could be a big win. Bluebird does code generation which using the assembler directly can definitely beat or be competitive with.
The biggest perf cliff is modifying the promise prototype or a promise instance. This would cause all our fast paths to fail since it's guarded by a map check. (This is true for most builtins in V8).
In my experience this is not very common, and I think we should (at least initially) aspire to fast promises with Promise.prototype
and consider subclasses/modification when we encounter it in the wild. While subclassing is viable - I haven't actually seen it. I haven't seen one subclass in over 2000 questions in StackOverflow - and only around 2-3 mention it. It has also been asked very rarely in the bluebird tracker.
Here's another microbenchmark by wikipedia that doesn't use util.promisify (but suffers from other problems of microbenchmarking) - wikimedia/web-stream-util#3
Thanks, I'll take a look and compare it with doxbee and see if I can run it on the scenarios above.
But I agree with @mcollina that it might be better to focus on async await and optimize promises as part of that process.
I agree. How do you feel about the optimization discussed in this comment (after "But that's it not spec-compliant, right?") - I think it could be a nice performance win with measurable impact on async/await.
As an aside, native promises have to spec compatible which requires doing additional work (like looking up the species constructor and maintaining strict ordering in the microtaskq) that polyfills can get away it.
I'm not sure what's hard about maintaining strict ordering in the microtask queue - in Node I think we queue our own microtasks and I don't recall it being a burden. At bluebird we schedule on setImmediate
historically (because of 0.8 support among other things) and trampoline all our method calls to one queue call with a zero allocation data structure.
By the way, looking at these results do you @caitp or @littledan happen to know why async/await
is so much slower than Promise.coroutine
? What's the additional overhead here and can we inline it?
Note that as an interesting (perhaps) observation - we can either inline a promise's value because it is available when it is awaited - or we don't have to defer the async function when resolving it. I'm wondering if we can consider a fast path for native promises in async functions that might be able to improve the results.
@caitp thanks for that, some observations I would love feedback on:
generator.next()
O(N) algorithm to append request to queue of requests
I'd assume in 99.9% of real world cases there is no queue here, you are only waiting for one value at a time, and I'd assume in 99% of cases that would be inside a for await
loop. I think that if we can optimize generator.next()
and the others _specifically for the case it is being for await
d like we optimize for of
for arrays it could be a huge win.
In the case we're for await
ing on an asynchronous generator (rather than a custom built async iterator) - can't we fast path and not allocate any promises? (since they are invisible to the user who can't react to them).
Note, I am specifically referring to the case where an async iterator is consumed by a for...await loop and the yielded values are not exposed anywhere else. I'm only talking about the promises next
returns here (and not the value if it is wrapped).I think that could be a nice win.
I'd assume in 99.9% of real world cases there is no queue here, you are only waiting for one value at a time, and I'd assume in 99% of cases that would be inside a for await loop. I think that if we can optimize generator.next() and the others _specifically for the case it is being for awaitd like we optimize for of for arrays it could be a huge win.
In the case where there's only one request, this is essentially
Label storeFirstRequest;
Node firstRequest = Load(generator, offsetof(GeneratorObject, requestQueue));
JumpIfUndefined(firstRequest, &storeFirstRequest);
// ... Loop to find the last request and store ....
storeFirstRequest.Bind();
Store(generator, offsetof(GeneratorObject, requestQueue), request);
I don't think this is too bad. You could shrink(? might be optimistic) code a bit by moving the slow case into a separate stub or something, but this is likely worse than the inlined loop. Using an array to represent the queue would be similar: you'd have to load the field, test if it's undefined, test if it's a FixedArray, code to handle those cases and the third case --- or always store in a FixedArray even for single requests, in which case you have additional loads/stores and length tracking + realloc code.
Once we have an opportunity to actually measure the best option, we can make more informed choices. But the singly linked list seems like the best choice for the common case, in terms of instructions used.
In the case we're for awaiting on an asynchronous generator (rather than a custom built async iterator) - can't we fast path and not allocate any promises? (since they are invisible to the user who can't react to them).
So as I said, Await allocates 2x closures and 2x Promises.
In the current design, I think we could easily eliminate the 2x closures by adding new private symbols indicating a common behaviour, which PromiseHandle would know about.
Eliminating the Promise allocations for Await would take more work, and I'm not 100% sure how it would fit together right now, since it's all very tied to the Promise implementation right now. @gsathya might have some ideas on how to do that.
Sorry if I bother with a not so relevant nit.
RE async iterators and streams.
Please, consider if we can do something for readline.createInterface()
here. Currently, we offer a rather bulky way to read a file line-by-line. I really miss the simplicity of the Perl while (<FILE>) { doSomethingWith($_); }
when I need to process multi-gigabyte text files with some CLI utilities.
@vsemozhetbyt I think thatβs just an API question, not a perf thing. Things we could do:
- Make readline interfaces async-iterable, i.e.
for await (const line of readlineInterface)
- Or maybe even better: Make all event emitters create async iterable for
.on()
/return promises for.once()
, i.e.for await (const line of readlineInterface.on('line))
andawait process.on('message')
Yes, sorry, I was also thinking about it as a use case that can be benchmarked easily for a big file, if I get this right.
@addaleax @vsemozhetbyt see prior work at nodejs/readable-stream#254 (ccing @mcollina )
I think that avoiding the two closures in readable streams is a good start - but allocating two promises would still make this a show stopper for real node event emitters and streams. Optimally I'd like to get:
for await(const item of stream) {
// work on item
}
As fast as
stream.on("data", item => { /* work on item */ })
In order for it to be useful for servers.
Although I really think we should focus on making async/await a zero overhead abstraction and there is definitely still a lot that can be done for that.
Refs: https://medium.com/netscape/async-iterators-these-promises-are-killing-my-performance-4767df03d85b
Very nice, it shows there are misconceptions and that async iterators are poorly communicated. For example this part:
The problem is that Promises have to be resolved in the next turn of the event loop. This is part of the spec. Thereβs no way around it.
Is incorrect. The Promises/A+ spec (which is the wrong spec for this case by the way) only requires platform code to remain before executing the next chunk. Since this is I/O dominated this happens anyway - and in particular nextTick
can be avoided in almost every case (as illustrated in the second half of #31 (comment))
Pinging @danvk to weigh in on their article :)
I was thinking about holding the meeting the week after next week (I think we have enough to discuss at this point). Is that time problematic for anyone who would like to participate?
Hi @benjamingr! I chimed in on the TC39 issue. It's been interesting to read all the responses to that article. One of my main takeaways is that, at least for my use of reading a big CSV file while starting a server, sync is just fine.
I'd love to hear more about this:
Very nice, it shows there are misconceptions and that async iterators are poorly communicated. For example this part:
The problem is that Promises have to be resolved in the next turn of the event loop. This is part of the spec. Thereβs no way around it.
Is incorrect. The Promises/A+ spec (which is the wrong spec for this case by the way) only requires platform code to remain before executing the next chunk. Since this is I/O dominated this happens anyway - and in particular nextTick can be avoided in almost every case (as illustrated in the second half of #31 (comment))
I was using node 7.6.0 and TypeScript with target: "es6"
for all my testing. Some additional testing with different combinations shows that there's considerable differences:
# Node 7.6, { "target": "es6" }
sync: 209 ms
async: 2856 ms
# Node 8.4, { "target": "es6" }
sync: 86 ms
async: 1193 ms
# Node 8.4, { "target": "ES2017" }
$ ./node_modules/.bin/ts-node async-iter.ts
sync: 57 ms
async: 973 ms
So looks like progress is being made. If for-await-of can be fast, then that's the best of all worlds!
@danvk If you can run your test in a recent Node-lkgr, Chrome Canary version, using the native implementation rather than a transpiled version, I think this would give some better information. Let me know if you want help setting that up.
FWIW, nodejs/node#15081.