eshaz/wasm-audio-decoders

Error: WASM string decode failed crc32 validation

joyqi opened this issue ยท 15 comments

joyqi commented

Hi, I'm using opus-decoder to develop a chrome extension. When I create a OpusDecoder instance, it throws an exception.

Uncaught (in promise) Error: WASM string decode failed crc32 validation
    at Function.value (WASMAudioDecoderCommon.js:165:37)
    at Function.value (WASMAudioDecoderCommon.js:173:31)
    at Function.value (WASMAudioDecoderCommon.js:30:17)
    at EmscriptenWASM.getModule (EmscriptenWasm.js:316:20)
    at EmscriptenWASM.instantiate (EmscriptenWasm.js:318:5)
    at WASMAudioDecoderCommon.instantiate (WASMAudioDecoderCommon.js:293:26)
    at OpusDecoder._init (OpusDecoder.js:30:24)
    at new OpusDecoder (OpusDecoder.js:238:14)
    at startRecording (offscreen.ts:48:9)

I put a breakpoint here:

ๆˆชๅฑ2023-10-19 13 50 15

The strangest thing is that if I use the ogg-opus-decoder library, it seems to work normally without any errors (of course, because the encoding is incorrect, it outputs an empty result, but the operation is normal). And looking at the source code, ogg-opus-decoder is dependent on the opus-decoder library.

I've created a demo project to reproduce this error, which you can access here: https://github.com/joyqi/opus-decoder-chrx-demo

eshaz commented

Thanks for opening this issue, and for providing details. I can see the problem in your screenshot.

To explain the issue, the WASM binaries are embedded into the JS files as a string that is extended-ASCII encoded. It uses almost all extended-ASCII values 0-255 where each character is representing one byte of WASM binary data. It looks like somewhere in your build process the single byte characters are being transformed into UTF-8 characters, which changes how JavaScript converts these characters back into a byte value. This fails the CRC validation since the bytes no longer match what is expected.

To fix this, you'll need to configure your build tool chain to avoid UTF-8 conversion on strings that contain characters outside of the normal ASCII range (0-127). The characters from the extended-ASCII range (128-255) should not be encoded into UTF-8. The rest of your project can remain UTF-8 encoded, the build process just should not modify the WASM string containing the binary data.

Here's a screenshot of the debugger running in the demo page paused at the same place the debugger is in your screenshot. The difference here is in the byteIndex and source variables showing up in the Local scope viewer on the right.

  • The byteIndex is larger in your example since the single byte characters are being escaped into UTF-8 characters, which will be two bytes when representing something in extended ASCII (128-255).
  • The source string in your example is replacing any characters in extended ASCII (128-255) with a unicode escape character.

image

I might also be able to put an enhancement into the library I use to encode and decode the WASM string to work around this, if you can't update your tool chain to avoid the UTF8 conversion.

joyqi commented

Thanks for your reply, based on your tips I have found some ideas. I'm using the Plasmo framework to develop chrome extensions, which uses parcel to build the project. In the code of Plasmo, I found that it calls swc to compile and bundle the code. Indeed, there are some discussions about the transformation of no-ASCII strings in the swc project (like: swc-project/swc#1744 swc-project/swc#1188). I think this is where the problem lies, but I currently can't find a way to turn it off in the project, I will keep trying.

For the opus-decoder, I wonder if the strings of the WASM file could be encoded in base64, although this would significantly increase the file size.

joyqi commented

UPDATE: I've found a hack way to avoid string transformation, https://github.com/PlasmoHQ/plasmo/blob/main/core/parcel-config/index.json#L66 this line tells the compiler to encode utf8 strings, just remove it.

And for developing Chrome extensions, you also need to add a special security policy declaration in the manifest file

    "content_security_policy": {
      "extension_pages": "script-src 'self' 'wasm-unsafe-eval'; object-src 'self'"
    }
eshaz commented

I'm glad you were able to fix it!

I might still add the enhancement I mentioned earlier to allow for the unicode escape characters when converting the string to binary so that this library is more compatible with these build tools.

The purpose of using this encoding method is to reduce the file size as much as possible. The WASM binary is relatively large when encoded as base64, and using dynEncode reduces the encoding overhead to around 1%.

joyqi commented

That is a big saving! I'm looking forward to this enhancement.

eshaz commented

@joyqi I've released new versions of each decoder that should fix this issue. UTF escape codes should no longer affect the functionality. Although, I would recommend avoiding the escape codes if reducing file size is important for your use case.

Please let me know if your build works with the UTF-8 escape characters after upgrading to the latest version.

joyqi commented

Great work! Now it's working properly.

BTW, is there a plan to implement webm-opus-decoder? Because I found that Chrome's MediaRecorder only supports webm format, not ogg.

eshaz commented

I don't have plans right now to add webm support. I would need to update codec-parser to demux webm into Opus frames.

If you have a few example files you could share, I would use them as test cases for implementing webm parsing.

joyqi commented

Here is a sample file: https://darkcoding.net/darkcoding-webm-ebml.webm

WEBM use EBML format, so we can use an EBML parser to extract the opus data from webm container.

I write some codes to extract webm frames, it works well in my project.

import { OpusDecoder } from "opus-decoder";
import { Decoder as WebmDecoder } from "ts-ebml";

const webmDecoder = new WebmDecoder();
const decoder = new OpusDecoder();
const frames: Uint8Array[] = [];

webmDecoder.decode(buf).forEach((element) => {
    if (element.type === 'b' && element.name === 'SimpleBlock') {
        // skip first 4 block header bits
        frames.push((element.data as Uint8Array).slice(4));
    }
});

const decoded = decoder.decodeFrames(frames);
console.log(decoded);

Ref: https://darkcoding.net/software/reading-mediarecorders-webm-opus-output/

eshaz commented

Thanks for that example. This is a great solution for anyone who wants to decode WEBM Opus file right now.

One small caveat, there are a few Opus parameters that are described in the Opus header that may not match the defaults that are set up in OpusDecoder. This should be fine though in most scenarios, as long as the audio sounds correct.

I'll take a look at that EMBL parser and see if I want to use it or implement my own in the codec-parser library.

joyqi commented

Yes, I did omit some steps. The better approach would be to read the meta information like sampleRate, channels, etc. from the header and then initialize the opus-decoder.

BTW, I have another question. I ultimately need to get audio data with a sampling rate of 16khz and 1 channel. If I forcefully initialize the opus-decoder with these parameters, but the input audio is 48khz, 2 channels, what would happen?

eshaz commented

Initializing decoder with a lower sample rate will result in the audio being down-sampled per the Opus spec. You can find the valid sample rates here.

Initializing the decoder with a channel count that doesn't match the stream configuration might result in errors or an unexpected result. Iif you want to change the number of channels you will need to process the output after decoding

joyqi commented

Thanks for your explanation. I've found a way to merge multiple channels.

I too get those errors with webpack bundles.
Any idea how to implement the suggested fix with webpack?

@Yahav are you referring to the crc validation errors? Here's a working webpack configuration that uses this project as a dependency: https://github.com/eshaz/icecast-metadata-js/blob/15cf791f30b4eb99f4beeebe2dec65c9154f2232/src/icecast-metadata-player/webpack.config.js