Error: WASM string decode failed crc32 validation
joyqi opened this issue ยท 15 comments
Hi, I'm using opus-decoder
to develop a chrome extension. When I create a OpusDecoder
instance, it throws an exception.
Uncaught (in promise) Error: WASM string decode failed crc32 validation
at Function.value (WASMAudioDecoderCommon.js:165:37)
at Function.value (WASMAudioDecoderCommon.js:173:31)
at Function.value (WASMAudioDecoderCommon.js:30:17)
at EmscriptenWASM.getModule (EmscriptenWasm.js:316:20)
at EmscriptenWASM.instantiate (EmscriptenWasm.js:318:5)
at WASMAudioDecoderCommon.instantiate (WASMAudioDecoderCommon.js:293:26)
at OpusDecoder._init (OpusDecoder.js:30:24)
at new OpusDecoder (OpusDecoder.js:238:14)
at startRecording (offscreen.ts:48:9)
I put a breakpoint here:
The strangest thing is that if I use the ogg-opus-decoder
library, it seems to work normally without any errors (of course, because the encoding is incorrect, it outputs an empty result, but the operation is normal). And looking at the source code, ogg-opus-decoder
is dependent on the opus-decoder
library.
I've created a demo project to reproduce this error, which you can access here: https://github.com/joyqi/opus-decoder-chrx-demo
Thanks for opening this issue, and for providing details. I can see the problem in your screenshot.
To explain the issue, the WASM binaries are embedded into the JS files as a string that is extended-ASCII encoded. It uses almost all extended-ASCII values 0-255 where each character is representing one byte of WASM binary data. It looks like somewhere in your build process the single byte characters are being transformed into UTF-8 characters, which changes how JavaScript converts these characters back into a byte value. This fails the CRC validation since the bytes no longer match what is expected.
To fix this, you'll need to configure your build tool chain to avoid UTF-8 conversion on strings that contain characters outside of the normal ASCII range (0-127). The characters from the extended-ASCII range (128-255) should not be encoded into UTF-8. The rest of your project can remain UTF-8 encoded, the build process just should not modify the WASM string containing the binary data.
Here's a screenshot of the debugger running in the demo page paused at the same place the debugger is in your screenshot. The difference here is in the byteIndex
and source
variables showing up in the Local scope viewer on the right.
- The
byteIndex
is larger in your example since the single byte characters are being escaped into UTF-8 characters, which will be two bytes when representing something in extended ASCII (128-255). - The
source
string in your example is replacing any characters in extended ASCII (128-255) with a unicode escape character.
I might also be able to put an enhancement into the library I use to encode and decode the WASM string to work around this, if you can't update your tool chain to avoid the UTF8 conversion.
Thanks for your reply, based on your tips I have found some ideas. I'm using the Plasmo
framework to develop chrome extensions, which uses parcel
to build the project. In the code of Plasmo
, I found that it calls swc
to compile and bundle the code. Indeed, there are some discussions about the transformation of no-ASCII strings in the swc
project (like: swc-project/swc#1744 swc-project/swc#1188). I think this is where the problem lies, but I currently can't find a way to turn it off in the project, I will keep trying.
For the opus-decoder
, I wonder if the strings of the WASM file could be encoded in base64, although this would significantly increase the file size.
UPDATE: I've found a hack way to avoid string transformation, https://github.com/PlasmoHQ/plasmo/blob/main/core/parcel-config/index.json#L66 this line tells the compiler to encode utf8 strings, just remove it.
And for developing Chrome extensions, you also need to add a special security policy declaration in the manifest
file
"content_security_policy": {
"extension_pages": "script-src 'self' 'wasm-unsafe-eval'; object-src 'self'"
}
I'm glad you were able to fix it!
I might still add the enhancement I mentioned earlier to allow for the unicode escape characters when converting the string to binary so that this library is more compatible with these build tools.
The purpose of using this encoding method is to reduce the file size as much as possible. The WASM binary is relatively large when encoded as base64, and using dynEncode reduces the encoding overhead to around 1%.
That is a big saving! I'm looking forward to this enhancement.
@joyqi I've released new versions of each decoder that should fix this issue. UTF escape codes should no longer affect the functionality. Although, I would recommend avoiding the escape codes if reducing file size is important for your use case.
Please let me know if your build works with the UTF-8 escape characters after upgrading to the latest version.
Great work! Now it's working properly.
BTW, is there a plan to implement webm-opus-decoder
? Because I found that Chrome's MediaRecorder
only supports webm
format, not ogg
.
I don't have plans right now to add webm support. I would need to update codec-parser
to demux webm into Opus frames.
If you have a few example files you could share, I would use them as test cases for implementing webm parsing.
Here is a sample file: https://darkcoding.net/darkcoding-webm-ebml.webm
WEBM use EBML format, so we can use an EBML parser to extract the opus data from webm container.
I write some codes to extract webm frames, it works well in my project.
import { OpusDecoder } from "opus-decoder";
import { Decoder as WebmDecoder } from "ts-ebml";
const webmDecoder = new WebmDecoder();
const decoder = new OpusDecoder();
const frames: Uint8Array[] = [];
webmDecoder.decode(buf).forEach((element) => {
if (element.type === 'b' && element.name === 'SimpleBlock') {
// skip first 4 block header bits
frames.push((element.data as Uint8Array).slice(4));
}
});
const decoded = decoder.decodeFrames(frames);
console.log(decoded);
Ref: https://darkcoding.net/software/reading-mediarecorders-webm-opus-output/
Thanks for that example. This is a great solution for anyone who wants to decode WEBM Opus file right now.
One small caveat, there are a few Opus parameters that are described in the Opus header that may not match the defaults that are set up in OpusDecoder
. This should be fine though in most scenarios, as long as the audio sounds correct.
I'll take a look at that EMBL parser and see if I want to use it or implement my own in the codec-parser library.
Yes, I did omit some steps. The better approach would be to read the meta information like sampleRate, channels, etc. from the header and then initialize the opus-decoder.
BTW, I have another question. I ultimately need to get audio data with a sampling rate of 16khz and 1 channel. If I forcefully initialize the opus-decoder with these parameters, but the input audio is 48khz, 2 channels, what would happen?
Initializing decoder with a lower sample rate will result in the audio being down-sampled per the Opus spec. You can find the valid sample rates here.
Initializing the decoder with a channel count that doesn't match the stream configuration might result in errors or an unexpected result. Iif you want to change the number of channels you will need to process the output after decoding
Thanks for your explanation. I've found a way to merge multiple channels.
I too get those errors with webpack bundles.
Any idea how to implement the suggested fix with webpack?
@Yahav are you referring to the crc validation errors? Here's a working webpack configuration that uses this project as a dependency: https://github.com/eshaz/icecast-metadata-js/blob/15cf791f30b4eb99f4beeebe2dec65c9154f2232/src/icecast-metadata-player/webpack.config.js