team-charls/charls

performance regression since 2.1.1

chafey opened this issue · 5 comments

I just pulled the latest version of charls and notice a ~15% drop in decode performance for the WASM build vs 2.1.1 (hash 0bafe4e). I pulled 2.2.0 and see the performance regression there as well. Is this expected/known? I checked the release notes and didn't see anything about this.

I was not aware that there is a performance regression on the WASM build. I don't normally profile the WASM build.
There have been changes to ensure that the fuzzer tests pass.

Any recommendation of a WASM profiler?

Before digging into WASM specific issues we should confirm that there is no regression in the native build. Do you capture performance metrics with each release right now by any chance? The commit I am using for the "fast version" is 4d1ef38 and the image I am using is CT1.JLS here: https://github.com/chafey/charls-js/blob/master/test/fixtures/jls/CT1.JLS

Do you have an easy way to compare native performance between that hash and head? I can work on this later if not

First checking native perf makes sense. There is no infrastructure to capture and analyze perf between releases at the moment.

The charlstest.exe tool provides an option to measure perf: -decodeperformance[:loop-count], it will look for a .jls file named decodetest.jls. Switching the charls-2-x64.dll versions should work.

The tool https://github.com/team-charls/charls-image-test will also measure decoding/encoding performance.

Initial benchmarking shows no difference between 4d1ef38 and latest version for native performance:

Decoding CT1.JLS = +/- 3 ms on AMD Ryzen CPU
Decoding 8 bit gray 5412 * 7216 (big_building.jls) = +/- 510 ms on AMD Ryzen CPU

x64 release build with MSVC 19.31.30818
Note: testing done with charlstest.exe -decodeperformance:100

I compared the latest CharLS-JS (chafey/charls-js@f5f168b) with CharLS-JS + latest CharLS (cce893e).

When building with emscripten 3.1.1 the latest version of CharLS is even a little bit faster. For example:
CharLS-JS 2.1.1 for MG1: 221 ms
CharLS-JS 2.2.1 for MG1: 193 ms

I saw the same type of timings for the other images and the Node benchmark tests: equal or 2.2.1 isa little bit faster.

Note 1: With the Crome dev tools open (F12) the decoding times become much slower, sometimes even a factor of 2.
Note 2: I had to add a special #ifdef __EMSCRIPTEN__ to the CharLS source code as the WebAssembly build would fail for the generic read_unaligned method.