WebAudio/web-audio-api-v2

No way to convert data from WebCodecs AudioData to AudioBuffer

guest271314 opened this issue · 7 comments

Describe the feature
WebCodecs defines AudioData. In the WebCodecs specification this note appears:

NOTE: The Web Audio API currently uses f32-planar exclusively.

However, the format of AudioData from AudioDecoder is 'f32' not 'f32-planar'.

Even though sampleRate set at AudioDecoder configuration sampleRate is other than 48000 (and opusenc supports --raw-rate option to specifically set sample rate for Opus encoded audio) the resulting WebCodecs AudioData instance always has sampleRate set to 48000.

The effective result is that there is no way that I am aware of to convert the data from AudioData.copyTo(ArrayBuffer, {planeIndex: 0}) to an AudioBuffer instance that can be played with AudioBufferSourceNode or resampled to a different sampleRate, for example, 22050.

Since MediaStreamTrackGenerator suffers from "overflow" and no algorithm exists in the WebCodecs specification to handle the overflow outside of one defined by the user it is necessary for the user to write the algorithm. After testing a user might find a magic number to delay the next call to MediaStreamTrackGenerator.writable.WritableStreamDefaultWriter.write() https://plnkr.co/edit/clbdVbhaRhCKWmPS that approach does not achieve the same result when attempting to use a Web Audio API AudioBuffer and AudioSourceNode

async function main() {
  const oac = new AudioContext({
    sampleRate: 48000,
  });
  let channelData = [];
  const decoder = new AudioDecoder({
    error(e) {
      console.error(e);
    },
    async output(frame) {
      const { duration: d } = frame;
      const size = frame.allocationSize({ planeIndex: 0 });
      const data = new ArrayBuffer(size);
      frame.copyTo(data, { planeIndex: 0 });
      const view = new Float32Array(data);
      let i = 0;
      for (let i = 0; i < view.length; i++) {
        if (channelData.length === 220) {
          const floats = new Float32Array(220);
          floats.set(channelData.splice(0, 220));
          const ab = new AudioBuffer({
            sampleRate: 48000,
            length: floats.length,
            numberOfChannels: 1,
          });
          ab.getChannelData(0).set(floats);
          const source = new AudioBufferSourceNode(oac, { buffer: ab });
          source.connect(oac.destination);
          console.log(ab.duration, ab.sampleRate);
          source.start();
          await new Promise((r) => {
            console.log(ab);
            source.onended = r;
          });
        }
        channelData.push(view[i]);
      }
      if (decoder.decodeQueueSize === 0) {
        if (channelData.length) {
          const floats = new Float32Array(220);
          floats.set(channelData.splice(0, 220));
          const ab = new AudioBuffer({
            sampleRate: 48000,
            length: floats.length,
            numberOfChannels: 1,
          });
          ab.getChannelData(0).set(floats);
          console.log(ab.duration, ab.sampleRate);
          const source = new AudioBufferSourceNode(oac, { buffer: ab });
          source.connect(oac.destination);
          source.start();
          await new Promise((r) => (source.onended = r));
          await decoder.flush();
          return;
        }
      }
    },
  });

  const encoded = await (await fetch('./encoded.json')).json();
  let base_time = encoded[encoded.length - 1].timestamp;
  console.assert(encoded.length > 0, encoded.length);
  console.log(JSON.stringify(encoded, null, 2));
  const metadata = encoded.shift();
  console.log(encoded[encoded.length - 1].timestamp, base_time);
  metadata.decoderConfig.description = new Uint8Array(
    base64ToBytesArr(metadata.decoderConfig.description)
  ).buffer;
  console.log(await AudioEncoder.isConfigSupported(metadata.decoderConfig));
  decoder.configure(metadata.decoderConfig);
  while (encoded.length) {
    const chunk = encoded.shift();
    chunk.data = new Uint8Array(base64ToBytesArr(chunk.data)).buffer;
    const eac = new EncodedAudioChunk(chunk);
    decoder.decode(eac);
  }
}

verifying the AudioData data and AudioBuffer channel data are incompatible.

Is there a prototype?
No.

Describe the feature in more detail

Web Audio API AudioBuffer <=> WebCodecs AudioData

Provide an algorithm and method to convert WebCodecs AudioData to Web Audio API AudioBuffer with option to set sample rate of the resulting object.

I used OfflineAudioContext to resample the hard-coded 48000 sample rate and numberOfFrames, 2568 for firs and 2880 for remainder of t AudioData objects output by AudioDeocder

https://chromium.googlesource.com/chromium/src/+/49cf62132c057a79b093c8b5ab72f195cac447cc/media/audio/audio_opus_encoder.cc#32

// For Opus, we try to encode 60ms, the maximum Opus buffer, for quality
// reasons.
constexpr int kOpusPreferredBufferDurationMs = 60;

https://chromium.googlesource.com/chromium/src/+/49cf62132c057a79b093c8b5ab72f195cac447cc/media/audio/audio_opus_encoder.cc#58

// default preferred 48 kHz. If the input sample rate is anything else, we'll
// use 48 kHz.

something like

  const TARGET_FRAME_SIZE = 220;
  const TARGET_SAMPLE_RATE = 22050;
  // ...
  const config = {
    numberOfChannels: 1,
    sampleRate: 22050, // Chrome hardcodes to 48000
    codec: 'opus',
    bitrate: 16000,
  };
  encoder.configure(config);
  const decoder = new AudioDecoder({
    error(e) {
      console.error(e);
    },
    async output(frame) {
      ++chunk_length;
      const { duration, numberOfChannels, numberOfFrames, sampleRate } = frame;
      const size = frame.allocationSize({ planeIndex: 0 });
      const data = new ArrayBuffer(size);
      frame.copyTo(data, { planeIndex: 0 });
      const buffer = new AudioBuffer({
        length: numberOfFrames,
        numberOfChannels,
        sampleRate,
      });
      buffer.getChannelData(0).set(new Float32Array(data));
      // https://stackoverflow.com/a/27601521
      const oac = new OfflineAudioContext(
        buffer.numberOfChannels,
        buffer.duration * TARGET_SAMPLE_RATE,
        TARGET_SAMPLE_RATE
      );
      // Play it from the beginning.
      const source = new AudioBufferSourceNode(oac, {
        buffer,
      });
      oac.buffer = source;
      source.connect(oac.destination);
      source.start();
      const ab = (await oac.startRendering()).getChannelData(0);
      for (let i = 0; i < ab.length; i++) {
        if (channelData.length === TARGET_FRAME_SIZE) {
          const floats = new Float32Array(
            channelData.splice(0, TARGET_FRAME_SIZE)
          );
          decoderController.enqueue(floats);
        }
        channelData.push(ab[i]);
      }
      if (chunk_length === len) {
        if (channelData.length) {
          const floats = new Float32Array(TARGET_FRAME_SIZE);
          floats.set(channelData.splice(0, channelData.length));
          decoderController.enqueue(floats);
          decoderController.close();
          decoderResolve();
        }
      }
    },
  });

The audio playback quality is sub-par when resampling from 48000 to 22050. What is the suggested procedure to produce quality audio without glitches, gaps, faster or slower rate frames when converting from WebCodecs AudioData to AudioBuffer for the purpose of breaking out of the hard-coded box of Chrome WebCodecs implementation?

webcodecs-serialize-to-json-deserialize-json.zip

The current design direction is to be able to create AudioBuffer objects directly from typed arrays, and to allow AudioBuffer to internally used more data types than f32. For now, authors need to create an AudioBuffer of the same size, use AudioData.copyTo to copy to an intermediate ArrayBuffer, and then copy (with possible conversion) to the AudioBuffer. This is wasteful and not ergonomic.

Another design direction is to be able to get the memory of an AudioData, and directly construct an AudioBuffer from this memory, skipping all copies (w3c/webcodecs#287).

There are several issues.

  • AudioData at AudioDecoder.output is absolutely dissimilar from the input AudioData at AudioEncoder.encode - developers use codecs for compression, not for implementation restrictions. Might as well just use opusenc directly or in WASM form is we cannot do opusenc --raw-rate 22050 input.wav output.opus equivalent in AudioEncoder configuration - the options are ignored by the implementation. I can use Native Messaging, fetch() reliably, WebTransport far less reliably, to input stream from browser and get STDOUT from native application.
  • Further resampling and converting to data with less length to accommodate MediaStreamTrackGenerator, where the AudioData output from an OsciallatorNode connected to MediaStreamAudioDestionaNode processed with MediaStreamTrackProcessor is also dissimilar to WebCodecs AudioDecoder.decode() output at output callback - with the same input.
  • AudioWorklet results in less glitches than MediaStreamTrackGenerator - when the input is processed in a Worker then WebAssembly.Memory.grow() is used - because when multiple ReadableStreams are processing in parallel and piped through and to other streams on the same thread, one can take priority and result in glitches in initial playback until the input is completely read - however, the only way to get an AudioWorklet instance is via an Ecmascript module, which limits usage due to CSP, and AudioWorklet does not expose fetch() or WebTransport - thus, use single memory with ability to grow; WebAssembly collects garbage -calling MediaStreamTrackGenerator stop() can crash the tab.

In summary, there needs to be consistency between these burgeoning API's so that user-defined conversion is not necessary, those if the user decides to convert between AudioData and AudioBuffer "seamlessly"; WebCodecs has free reign to do whatever it wants - why would the decoder only output 48000 sample rate when I deliberately input 22050 sample rate, 1 channel to configuration? That is inviting user-defined conversion (issues).

I updated and tested the code using OfflineAudioContext a few hundred more times and compared to creating a WAV file using data from AudioData.copyTo()

// https://github.com/higuma/wav-audio-encoder-js
class WavAudioEncoder {
  constructor({ buffers, sampleRate, numberOfChannels }) {
    Object.assign(this, {
      buffers,
      sampleRate,
      numberOfChannels,
      numberOfSamples: 0,
      dataViews: [],
    });
  }
  setString(view, offset, str) {
    const len = str.length;
    for (let i = 0; i < len; i++) {
      view.setUint8(offset + i, str.charCodeAt(i));
    }
  }
  async encode() {
    const [{ length }] = this.buffers;
    const data = new DataView(
      new ArrayBuffer(length * this.numberOfChannels * 2)
    );
    let offset = 0;
    for (let i = 0; i < length; i++) {
      for (let ch = 0; ch < this.numberOfChannels; ch++) {
        let x = this.buffers[ch][i] * 0x7fff;
        data.setInt16(
          offset,
          x < 0 ? Math.max(x, -0x8000) : Math.min(x, 0x7fff),
          true
        );
        offset += 2;
      }
    }
    this.dataViews.push(data);
    this.numberOfSamples += length;
    const dataSize = this.numberOfChannels * this.numberOfSamples * 2;
    const view = new DataView(new ArrayBuffer(44));
    this.setString(view, 0, 'RIFF');
    view.setUint32(4, 36 + dataSize, true);
    this.setString(view, 8, 'WAVE');
    this.setString(view, 12, 'fmt ');
    view.setUint32(16, 16, true);
    view.setUint16(20, 1, true);
    view.setUint16(22, this.numberOfChannels, true);
    view.setUint32(24, this.sampleRate, true);
    view.setUint32(28, this.sampleRate * 4, true);
    view.setUint16(32, this.numberOfChannels * 2, true);
    view.setUint16(34, 16, true);
    this.setString(view, 36, 'data');
    view.setUint32(40, dataSize, true);
    this.dataViews.unshift(view);
    return new Blob(this.dataViews, { type: 'audio/wav' }).arrayBuffer();
  }
}
// ...
const wav = new WavAudioEncoder({
  sampleRate: 48000,
   numberOfChannels: 1,
   buffers: [new Float32Array(data)],
});
const ab = (await ac.decodeAudioData(await wav.encode())).getChannelData(0);

Glitches can occasionally occur in the beginning of the OfflineAudioContext playback. No glitches occur creating WAV headers and prepending the headers to the data. Test and compare the differences for yourself https://guest271314.github.io/webcodecs/.

Are these the simplest approaches to resample the output from AudioDecoder.decode()?

The important point is that it is only necessary to resample the data from AudioData at AudioDecoder.output becuase WebCodecs does not honor AudioEncoder or AudioDecoder configuration and resamples to 48000, and outputs numberOfFrames far greater than input numberOfFrames which is inconsistent behaviour.

If there was consistency between WebCodecs AudioEncoder.output and AudioDecoder.output with regard to AudioData there would be no need to resample with Web Audio API.

Two things:

  • This approach to resample segments of audio with an OfflineAudioContext cannot work. Non-naive audio resampling is a stateful operation, and creating a new OfflineAudioContext each time doesn't allow keeping any state. Resampling using an OfflineAudioContext only works if the entirety of the audio is resampled in one operation.

  • Resampling to another rate is not in the scope of Web Codecs. Web Codecs is just about decoding and encoding, and resampling the audio to play it out is expected for now, since there is no resampler object in the Web Platform yet. Opus always works in 48kHz internally, and by default always decodes to 48kz, so this is what you see in Web Codecs. For other codecs, you'll see that the rate is (usually) the rate of the input stream.

The problem is resampling is necessary based on WebCodecs output.

All you need do is test the output of AudioDecoder and try to pass that AudioData directly to a MediaStreamTrackGenerator. One of two outcomes currently exist without user-defined intervention:

I can do $ opusenc --raw-rate 22050 input.wav output.opus and get the output I set. WebCodecs ignores the configuration, yet claims "flexibility". Since you are citing 48kHz as the inflexible default for WebCodecs implementation of 'opus' you need to update your specification to state that unambiguously so that I no longer will expect the option I pass to be effectual.

Resampling is necessary with the output of WebCodecs AudioDecoder AudioData to outerh API's - without using setTimeout() and essentially guessing when the incompatible-with AudioData will end.

I suggest you folks actually test AudioDecoder => MediaStreamTrackGenerator, and stop claiming WebCodecs is "flexible" is you intend on restricting options available using opusenc and opusdec. I might as well just use opusenc and opusdec with fetch() or WebTransport.