Charls Codecs Performance Issues After Update
sedghi opened this issue · 10 comments
We have been using the Charls codecs in our decoding process for medical imaging compressed DICOM formats and have recently updated our fork to the latest version as of December 2022 (commit 208b1bf). Unfortunately, we are now experiencing a significant decrease in performance and slower decoding times (compared to our previous Charls base at Jan 13th, 2021, commit hash 62140f5).
I was wondering if you can help me debug my problem. Any new flags that should be set for performance?
Our Charls Codecs: https://github.com/cornerstonejs/codecs/tree/main/packages/charls
Are you seeing a performance decrease in the native C++ build or in the WebAssembly build?
I have once looked into a WebAssembly performance problem, but that was caused by having the developer tools window open (F12).
For a question about the native C++ build, I need more details: which OS, C++ compiler, platform (x64/ARM/etc) ?
It is WebAssembly build.
it is not our developer tools open, but seeing considerably slower performance.
Any recommendation on how to debug this?
There is only one more thing that has changed compared to the last version (faster version on Jan2021): previously we had the TOTAL_MEMORY=1gb
which caused a lot of issues in windows, so for this version (Dec 2022), we have used
ALLOW_MEMORY_GROWTH=1
TOTAL_MEMORY=50mb
Can this reduction in the total memory be related?
This is the CMAKE
When I compare the performance between the demo on:
https://chafey.github.io/charls-js/test/browser/index.html
and my forked copy with only CharLS updated to v2.41. I am getting
WebAssembly v2.1.1: decode time 51 ms for SC1
WebAssembly v2.4.1: decode time 47 ms for SC1,
other images show the same behaviour, v2.41. is slightly faster then v.2.1.1
This with emcc 3.1.5 on Ubuntu 22.0 and Chrome 109.
It is possible that TOTAL_MEMORY=50mb has an impact, That would be easy to verify by using the original value to see if that makes a difference. CharLS itself need some stack space and 1 scan line as buffer but uses in general the passed input and output buffer. 50 MByte would be enough to decode an DICOM image in general.
Note: I am assuming that you are testing the release build, the debug build would of course be significant slower.
@vbaderks Yeah, the memory difference came to my mind today, so I will test the bigger memory with new code.
I'm not sure what is the difference between the release build and debug build. Is there any documentation on which is which?
Thanks!
CharLS can be build in debug or release mode. Debug has more checking, easier to debug but is not optimized.
WebAssembly has also this concept. It can be controlled by using the CMake option CMAKE_BUILD_TYPE = Debug or CMAKE_BUILD_TYPE = Release. When this option is not set the default is Release.
Note: latest emcc docs use the option INITIAL_MEMORY instead of TOTAL_MEMORY. Probably both names can still be used, but I didn't test it.
@sedghi : Did you made progress resolving the WebAssembly performance issue you are observing?
I have not seen any performance issues for WebAssembly for the v2.3.y versions.
Some optimization has been done for native CPUs, but that code is disabled when building for WebAssembly as it would crash (last tested 1 year ago). The WebAssembly compiler is still under active development, so this may also be resolved.