eshaz/wasm-audio-decoders

Demo failing on certain files.

EricTetz opened this issue · 5 comments

So I noticed your speedy work on adding multichannel support. It works great with the test file we added here. Awesome work.

I have a few files closer to my use case that fail in the demo. They were created using this script. The third one is actually a snippet of the aforementioned test file, chopped out by that script. They all play fine with ffplay and Reaper (screenshots are from Reaper), but fail in the demo.

Broken1
image

Broken2
image

Broken3
image

broken.zip

eshaz commented

These files are very short. The 2nd and 3rd files in your upload each contain a total of 4 Opus frames per channel that totals to 3840 samples (80 milliseconds @ 48000Hz). I put these files through my test code in codec-parser to see what data was in there.

Here's what broken3.opus looked like. See RFC 7845 pg. 3 for info on how an Ogg Opus file is structured.

broken3.opus
[
  // Ogg Identification Page
  {
    "codecFrames": [],
    "crc32": 754276737,
    "duration": 0,
    "isContinuedPacket": false,
    "isFirstPage": true,
    "isLastPage": false,
    "pageSequenceNumber": 0,
    "samples": 0,
    "streamSerialNumber": 313345562,
    "totalSamples": 0,
    "totalDuration": 0,
    "totalBytesOut": 0
  },
  // Ogg Comment Page
  {
    "codecFrames": [],
    "crc32": 1292065918,
    "duration": 0,
    "isContinuedPacket": false,
    "isFirstPage": false,
    "isLastPage": false,
    "pageSequenceNumber": 1,
    "samples": 0,
    "streamSerialNumber": 313345562,
    "totalSamples": 0,
    "totalDuration": 0,
    "totalBytesOut": 0
  },
  // Ogg Audio Page with 4 Opus frames
  {
    "codecFrames": [
      {
        "header": {
          "bitDepth": 16,
          "bitrate": 2152,
          "channels": 16,
          "sampleRate": 48000,
          "bandwidth": "fullband",
          "channelMappingFamily": 255,
          "channelMappingTable": [
            0,
            1,
            2,
            3,
            4,
            5,
            6,
            7,
            8,
            9,
            10,
            11,
            12,
            13,
            14,
            15
          ],
          "coupledStreamCount": 0,
          "frameCount": 1,
          "frameSize": 20,
          "inputSampleRate": 48000,
          "mode": "CELT-only",
          "outputGain": 0,
          "preSkip": 312,
          "streamCount": 16
        },
        "samples": 960,
        "duration": 20,
        "frameNumber": 0,
        "totalBytesOut": 0,
        "totalSamples": 0,
        "totalDuration": 0,
        "crc32": 1805224814
      },
      {
        "header": {
          "bitDepth": 16,
          "bitrate": 1904,
          "channels": 16,
          "sampleRate": 48000,
          "bandwidth": "fullband",
          "channelMappingFamily": 255,
          "channelMappingTable": [
            0,
            1,
            2,
            3,
            4,
            5,
            6,
            7,
            8,
            9,
            10,
            11,
            12,
            13,
            14,
            15
          ],
          "coupledStreamCount": 0,
          "frameCount": 1,
          "frameSize": 20,
          "inputSampleRate": 48000,
          "mode": "CELT-only",
          "outputGain": 0,
          "preSkip": 312,
          "streamCount": 16
        },
        "samples": 960,
        "duration": 20,
        "frameNumber": 1,
        "totalBytesOut": 5386,
        "totalSamples": 960,
        "totalDuration": 20,
        "crc32": 912092372
      },
      {
        "header": {
          "bitDepth": 16,
          "bitrate": 1680,
          "channels": 16,
          "sampleRate": 48000,
          "bandwidth": "fullband",
          "channelMappingFamily": 255,
          "channelMappingTable": [
            0,
            1,
            2,
            3,
            4,
            5,
            6,
            7,
            8,
            9,
            10,
            11,
            12,
            13,
            14,
            15
          ],
          "coupledStreamCount": 0,
          "frameCount": 1,
          "frameSize": 20,
          "inputSampleRate": 48000,
          "mode": "CELT-only",
          "outputGain": 0,
          "preSkip": 312,
          "streamCount": 16
        },
        "samples": 960,
        "duration": 20,
        "frameNumber": 2,
        "totalBytesOut": 10144,
        "totalSamples": 1920,
        "totalDuration": 40,
        "crc32": 736132645
      },
      {
        "header": {
          "bitDepth": 16,
          "bitrate": 1680,
          "channels": 16,
          "sampleRate": 48000,
          "bandwidth": "fullband",
          "channelMappingFamily": 255,
          "channelMappingTable": [
            0,
            1,
            2,
            3,
            4,
            5,
            6,
            7,
            8,
            9,
            10,
            11,
            12,
            13,
            14,
            15
          ],
          "coupledStreamCount": 0,
          "frameCount": 1,
          "frameSize": 20,
          "inputSampleRate": 48000,
          "mode": "CELT-only",
          "outputGain": 0,
          "preSkip": 312,
          "streamCount": 16
        },
        "samples": 960,
        "duration": 20,
        "frameNumber": 3,
        "totalBytesOut": 14339,
        "totalSamples": 2880,
        "totalDuration": 60,
        "crc32": 719328457
      }
    ],
    "crc32": 56315346,
    "duration": 80,
    "isContinuedPacket": false,
    "isFirstPage": false,
    "isLastPage": false,
    "pageSequenceNumber": 57,
    "samples": 3840,
    "streamSerialNumber": 313345562,
    "totalSamples": 3840,
    "totalDuration": 80,
    "totalBytesOut": 18534
  }
]

You did find a bug in my code that was preventing these short files from being decoded. I've fixed it locally, and I'll get a release here soon for that.

I think I found a bug / design problem in that tool you are using though. The Opus codec always starts with a bit of silence, called the pre-skip, that populates the decoder's state before actual audio starts. When encountering a new Opus stream (i.e. a new Ogg Opus file), the decoder should "decode, but discard" these samples, see RFC 7845 pg. 9. Since that tool is splicing the data by Ogg pages, it's probably not accounting for that pre-skip value, and a normal decoder will lose those pre-skip samples.

If you are splicing the data, especially this small, then you will quickly start to lose audio and become out of sync / hear clicks and pops where the missing audio is. Currently, ogg-opus-decoder always discards those pre-skip samples. I'm not sure this audio would be accurate if it was returned.

Can you verify what the total sample count should be for each of your files? Does it match the total samples, or decoded samples value below?

file total samples decoded samples pre-skip samples
broken1.opus 11520 11208 312
broken2.opus 3840 3528 312
broken3.opus 3840 3528 312
eshaz commented

I've released ogg-opus-decoder/1.4.2 which fixes this bug with decoding small files. Thanks for letting me know about that!

I can't speak to the pre-skip samples. Never done audio programming until a few days ago, so I'm poking around in the dark here. That said, I made a test app that appears to work.

Here's a super crude proof of concept. It fetches an opus file, then parses it using the aforementioned OpusFileSplitter (which I've renamed just OpusFile). I grab 50K chunks, decode them using your decoder, and add them to a queue. To seek around in the file, I just change opus page offset and start rebuffering chunks.

The next step is to move the chunk fetching to a service call, so rather than loading the entire file into memory, which is untenable for a 4 hour recording, I can just stream the parts I need.

eshaz commented

This seems to work pretty well and is very responsive. Let me know how the rest of it goes. Will this application be published as open source when you're done?

I learned enough last night build a crude mixer (gain, pan, reverb send on each channel). I struggled all day just to mix my 16 channel source to mono, then started to get the API and things fell in pace much faster. I think there are no more technical barriers to building what I want. Just a huge amount of work to flesh out the pieces and harden them.

I need to make metadata for the sessions, which includes which files are in the session, track names, and a rough mix for the session (trim, gain, pan, low pass, compression, reverb settings for each channel). I need to write the back end that streams the chunks, add the ability to do so across file boundaries. I'm going to preprocess the Opus files in the session and save out the page indices, so I don't have to scan the files every time someone loads a session. This will also let me host files in something like AWS. I can use my index to determine the bytes I need then do a range request.

Whether I open source it depends on how generalized I can make it. If it ends up being a huge hack job that only really works for my band recordings, I'll by more shy about sharing. That said, I literally couldn't build it without all the open source stuff I've been able to reference, and a lot of it didn't have to be polished to be helpful.

Your repo is a core enabling technology. Thanks so much for your help!