srijs/rust-crc32fast

Hashing on target `wasm-unknown-unknown` is very slow

tonyhb opened this issue · 5 comments

Following up on an issue reported in zip-rs (zip-rs/zip-old#144), calculating the CRC in wasm for files slows down decompression by 150x.

This library is incredible on native, but unfortunately poses some issues when compiling with wasm_bindgen. Disclaimer: I know almost nothing about wasm but am looking into it.

srijs commented

For sure! Here's a minimal repro that I created for you: https://github.com/tonyhb/rc32fast-repro

To run, make build-web make run-web runs wasm. You can also make to run native and test. It comes with 3 zips - using the larger 3mb zip may cause you to spin, so to profile I use the smaller 256kb zip.

I read your code and couldn't understand why, either, but I've been doing rust for all of 4 days and need to dive into how it all works. Getting there!

srijs commented

Thanks for the minimal reproduction, that was super useful!

The fix itself is relatively simple: You need to build the code in "release" mode (by passing --release to cargo) in order to get decent performance. With this flag enabled, parsing the 3MB zip file takes ~200ms in total, with the crc32 calculation at around 20ms.

In your case, I've applied the following changes:

diff --git a/Makefile b/Makefile
index 2174ce3..53de813 100644
--- a/Makefile
+++ b/Makefile
@@ -6,6 +6,6 @@ run-web:

 build-web:
        RUSTFLAGS=--cfg=web_sys_unstable_apis \
-                 cargo build --target wasm32-unknown-unknown
-       wasm-bindgen ./target/wasm32-unknown-unknown/debug/crc32fast-repro.wasm --out-dir target/build --web
+                 cargo build --target wasm32-unknown-unknown --release
+       wasm-bindgen ./target/wasm32-unknown-unknown/release/crc32fast-repro.wasm --out-dir target/build --web
        cp ./src/web/index.html ./target/build/

If you're new to Rust, here is something you can take away from this: Building Rust programs in debug mode rarely gives an accurate picture when it comes to performance, because a lot of Rust's higher-level abstractions rely on the optimizer to achieve their performance goals.

I've spent some time looking at the generated WASM as well as LLVM bytecode, and I'm still not entirely sure why exactly Rust/LLVM is deciding to add the call to memcpy when compiling in debug mode, perhaps there's something that could be improved in LLVM's WASM backend.

But at this stage the best thing to do is probably to compile in release mode and make sure that the memcpy calls get removed by the optimizer. As your application grows more complex, the chances of you using another abstraction or library that runs into similar issues with the generated code in debug mode is fairly high, and so using --release is going to the best way to address this broadly.

You can of course still test and debug your code without --release, which will give you a quicker dev loop, but for builds where you care about performance you'll want to add the --release flag.

Let me know if you have any questions or don't agree with this, otherwise I'm inclined to close this issue!

Wow. Can't believe I didn't look at that 🤦 . Sorry to take up your time, dude, and I appreciate you taking your personal time to look at it.

Did not expect that memcpy would be added in debug mode.

Thanks again. Let's close.

Also was just catching up with a coworker and (this is very off topic) we both really appreciate the time and effort you put into your response. It's super helpful, very thoughtful, and it's also really friendly. Thank you for being awesome!