Considerable slowness on node environment in comparison to bluebird
niwinz opened this issue · 8 comments
Using this test code:
var Zousan = require('zousan');
var Bluebird = require('bluebird');
var bluebirds = [];
var zousans = [];
console.time('zousan');
for(var i = 0; i < 500000; i++) {
zousans.push(new Zousan(function(resolve, reject) {
resolve(1);
}));
}
Zousan.all(zousans).then(() => {console.timeEnd('zousan')});
// ----------------------
console.time('bluebird');
for(var j = 0; j < 500000; j++) {
bluebirds.push(new Bluebird(function(resolve, reject) {
resolve(1);
}));
}
Bluebird.all(bluebirds).then(() => {console.timeEnd('bluebird')});
And this environment:
[3/5.2]{1}niwi@kaleidos:~/tmp/speedtest> npm ls
/home/niwi/tmp/speedtest
├── bluebird@3.3.4
└── zousan@2.2.2
[3/5.2]niwi@kaleidos:~/tmp/speedtest> node --version
v5.10.1
The results are:
[3/5.2]niwi@kaleidos:~/tmp/speedtest> node speedtest.js
zousan: 1112.415ms
bluebird: 440.596ms
Thanks for the heads-up Andrey. That is a big difference.
One problem with your test, however, is that it is running them simultaneously. Better to wait until the Zousan test finishes before starting the Bluebird test.
That said, after testing them independently, there is still a substantial performance difference. Though in smaller batches, Zousan comes out ahead:
With a 100 promise batch, for example:
zousan: 2.687ms
bluebird: 3.961ms
Another issue with many tests is "startup costs" - as some libraries incur a significant "first time through" cost - and indeed, if running both these tests twice within the same program (100 promises per batch), we get:
zousan: 2.708ms
bluebird: 3.767ms
zousan: 1.064ms
bluebird: 2.457ms
But again, after cranking it up to 500,000 promises, Bluebird beats Zousan by a substantial margin. I will look into this a bit further - it would be nice to capture any additional speed we can, as long as it doesn't interfere with our other goals (such as keeping it simple, small, and cross-platform)
Nice! I'm currently maintaining https://github.com/funcool/promesa that is backed by bluebird, but I'm currently planning switch to use zousan, because of the evident performance and the small size in comparison to bluebird.
Thanks for creating this library.
@niwinz there's a lot of non-spec features (tap, spread, etc) and debug-ability that was added to bluebird, and faster performance server-side with a large number of promises.
If you're targeting server-side, I'd stay with bluebird since size difference isn't so important.
I'm using it mainly for web development and the size matters. Thanks.
Cool project @niwinz - and thanks for chiming in @avimar - good points.
I did some more testing and investigating. I tried some changes to see if I could speed things up (shown below). I found that as batches got smaller, the differences got likewise smaller - and somewhere between 10,000 and 20,000 promises, they cross each other. I also added an in-series test (.all is parallel) and threw in the native Node Promise object for comparison. I also have the promise return a simple (unique) object to make it more realistic. (Will post code as a Gist):
Some results:
For a 10 promise batch (each test is run 5 times, average value shown):
Bluebird-All - Average runtime for 5 repititions = 0.95ms
Zousan-All - Average runtime for 5 repititions = 0.13ms
Native-All - Average runtime for 5 repititions = 0.44ms
Bluebird-Series - Average runtime for 5 repititions = 0.54ms
Zousan-Series - Average runtime for 5 repititions = 0.06ms
Native-Series - Average runtime for 5 repititions = 0.3ms
Zousan is several times faster in both All and Series tests. Native is also beating Bluebird here.
For 100 promise batch:
Bluebird-All - Average runtime for 5 repititions = 1.74ms
Zousan-All - Average runtime for 5 repititions = 0.64ms
Native-All - Average runtime for 5 repititions = 1.51ms
Bluebird-Series - Average runtime for 5 repititions = 2.01ms
Zousan-Series - Average runtime for 5 repititions = 0.57ms
Native-Series - Average runtime for 5 repititions = 0.62ms
This batch size is possibly quite common (a directory of stats, etc.) so possibly quite relevant in actual use on Node. Zousan still quite a bit faster. Native also holds its own, especially in the series run.
A 1000 promise batch:
Bluebird-All - Average runtime for 5 repititions = 4.79ms
Zousan-All - Average runtime for 5 repititions = 3.56ms
Native-All - Average runtime for 5 repititions = 6.92ms
Bluebird-Series - Average runtime for 5 repititions = 4.1ms
Zousan-Series - Average runtime for 5 repititions = 1.7ms
Native-Series - Average runtime for 5 repititions = 6.97ms
Native promises starting to choke a bit. Zousan still the clear winner, especially in a Series.
A 10,000 promise batch:
Bluebird-All - Average runtime for 5 repititions = 28.78ms
Zousan-All - Average runtime for 5 repititions = 34.91ms
Native-All - Average runtime for 5 repititions = 86.19ms
Bluebird-Series - Average runtime for 5 repititions = 17.81ms
Zousan-Series - Average runtime for 5 repititions = 12.47ms
Native-Series - Average runtime for 5 repititions = 83.78ms
Looks like Bluebird takes the blue ribbon on the All() here, beating Zousan by 17.5%. In Series runs, Zousan is faster by about 30%. Native promises are several times slower here in both cases.
A 100,000 promise batch:
Bluebird-All - Average runtime for 5 repititions = 274.68ms
Zousan-All - Average runtime for 5 repititions = 425.76ms
Native-All - Average runtime for 5 repititions = 1062.65ms
Bluebird-Series - Average runtime for 5 repititions = 164.22ms
Zousan-Series - Average runtime for 5 repititions = 302.12ms
Native-Series - Average runtime for 5 repititions = 721.73ms
Here, Bluebird pulls away from Zousan handily, while Native is losing badly.
Just for "fun", lets do a million promises:
Bluebird-All - Average runtime for 5 repititions = 2687ms
Zousan-All - Average runtime for 5 repititions = 5265ms
Native-All - Average runtime for 5 repititions = CRASHED (Out of Memory)
Bluebird-Series - Average runtime for 5 repititions = 1386.61ms
Zousan-Series - Average runtime for 5 repititions = 2419.32ms
Native-Series - Average runtime for 5 repititions = 8043.16ms
So, here are my thoughts at this point:
Bluebird is doing something clever to handle very large batches of promises. I'm not sure what it is. I even took a look at their source code - I see some BIT_FIELD variables that might suggest they are storing statuses in a bit mask of some sort - which would require less memory per promise - which at super large numbers might start to pay off (though would slow you down for smaller batches). Also it appears they may in-line some code that I make into a function (for clarity, re-use, and size).
So at the risk of sounding opportunistic in the face of defeat - I'm leaning towards chalking this up as a choice of what to excel in. Do you optimize for 10,000 and less or for 20,000 and up? Do you opt for code speed at all costs, even if it inflates the source code length and complexity?
I'm feeling like we have a good balance here - and at least in all the cases I've used promises, Zousan is the faster library while still enjoying the benefits of small and simple. In fact, if Bluebird wasn't as heavily used as it is, I would be suspect of its robustness.
For those interested, I will add the test code as a Gist, and add another comment here with some of the ideas I tried to speed up Zousan (all them them resulted in a slowdown).
So I am currently out of ideas to speed this up while staying within the goals of the project. If others have some ideas, I'm certainly interested to hear them. I read somewhere that some folks from Google helped Petka speed up Bluebird - so I'd be happy to talk with Google if they contact me.
The gist is here.
For those interested, here are the changes I tried to see if they sped up Zousan:
Use nextTick in place of proprietary soon() function. This slowed things down.
Use nextTick within soon() in place of setImmediate(). This also slowed things a bit (less than above)
Optimized for single then() - converting to array when necessary. Slowed it down.
Pre-init some variables as undefined - slower
Initialized All array to size of result to return - no effect
And again, remember that when running tens of thousands of promises, the time of the deferred function will quite likely dwarf the time of the promise code - there is no real use of processing 10k promises that simply resolve(1) [I'm still in favor of maxing the speed of promises, of course!] - which is why a speed difference when processing 10 promises is arguably more important than a speed difference when processing 100,000.
To wit, the per-promise cost of processing 10 promises by Bluebird was 0.95ms / 10 = 0.095ms and Zousan was 0.013ms per promise. At 100,000 the per promise cost for Bluebird was 274.68ms / 100000 = 0.0027ms and for Zousan was 0.0042ms. So, in Bluebird's case, when processing 10 promises, the promise code is 35 times more "influential" to the total running time than when processing 100k. (i.e. each promise takes 35 times longer to run)
Will close for now - but happy to re-open if we get new info about this.
@bluejava thanks for this exhaustive analysis. I agree that 20k promise batch seems like inusuaal use case and the most used use cases zousan is clearly a winner. Nothing to say here.
Great work. I'm seriously considering switch to zousan in my clojurescript library.
Thanks Andrey - was good to go down this path a bit more thoroughly.
Best of luck with your clojurescript library!