Potential performance improvement
NickStrupat opened this issue · 1 comments
Hi! Thanks for the awesome work here. I've been tinkering with it for the past couple of evenings.
I've been testing for potential performance improvements by switching from std::vector
& your allocator to using blocks of memory allocated on the stack (via alloca
). I'm seeing a 25%-ish improvement for short strings, but about 25%-ish degradation for longer strings. I'm wondering if you have any ideas why that might be?
Also, I haven't yet done a proper multi-thread test. I suspect the contention on _mm_malloc
will yield a higher performance improvement when using alloca
, since the threads don't need to sync on heap allcoation. If I get some time next week I will try to do that and I'd love to hear what you think of this approach.
The changes I made to achieve the stack allocation are here...
NickStrupat@58ee960
I like the idea, but switching to alloca
unconditionally might not be ideal, because it’s completely possible to bring things to a stack overflow using a single call that way…
What we do in Node.js core a lot is using a fixed-size stack-allocated buffer for small strings, and deferring to a larger, heap-allocated buffer for longer ones.
Also, this library is generally better at handling long strings – for short ones, vectorization doesn’t really pay out, and there’s the additional overhead of JS → C++ calls.