No way to convert data from WebCodecs AudioData to AudioBuffer
guest271314 opened this issue · 7 comments
Describe the feature
WebCodecs defines AudioData
. In the WebCodecs specification this note appears:
NOTE: The Web Audio API currently uses f32-planar exclusively.
However, the format
of AudioData
from AudioDecoder
is 'f32'
not 'f32-planar'
.
Even though sampleRate
set at AudioDecoder
configuration sampleRate
is other than 48000
(and opusenc
supports --raw-rate
option to specifically set sample rate for Opus encoded audio) the resulting WebCodecs AudioData
instance always has sampleRate
set to 48000
.
The effective result is that there is no way that I am aware of to convert the data from AudioData.copyTo(ArrayBuffer, {planeIndex: 0})
to an AudioBuffer
instance that can be played with AudioBufferSourceNode
or resampled to a different sampleRate
, for example, 22050
.
Since MediaStreamTrackGenerator
suffers from "overflow" and no algorithm exists in the WebCodecs specification to handle the overflow outside of one defined by the user it is necessary for the user to write the algorithm. After testing a user might find a magic number to delay the next call to MediaStreamTrackGenerator.writable.WritableStreamDefaultWriter.write()
https://plnkr.co/edit/clbdVbhaRhCKWmPS that approach does not achieve the same result when attempting to use a Web Audio API AudioBuffer
and AudioSourceNode
async function main() {
const oac = new AudioContext({
sampleRate: 48000,
});
let channelData = [];
const decoder = new AudioDecoder({
error(e) {
console.error(e);
},
async output(frame) {
const { duration: d } = frame;
const size = frame.allocationSize({ planeIndex: 0 });
const data = new ArrayBuffer(size);
frame.copyTo(data, { planeIndex: 0 });
const view = new Float32Array(data);
let i = 0;
for (let i = 0; i < view.length; i++) {
if (channelData.length === 220) {
const floats = new Float32Array(220);
floats.set(channelData.splice(0, 220));
const ab = new AudioBuffer({
sampleRate: 48000,
length: floats.length,
numberOfChannels: 1,
});
ab.getChannelData(0).set(floats);
const source = new AudioBufferSourceNode(oac, { buffer: ab });
source.connect(oac.destination);
console.log(ab.duration, ab.sampleRate);
source.start();
await new Promise((r) => {
console.log(ab);
source.onended = r;
});
}
channelData.push(view[i]);
}
if (decoder.decodeQueueSize === 0) {
if (channelData.length) {
const floats = new Float32Array(220);
floats.set(channelData.splice(0, 220));
const ab = new AudioBuffer({
sampleRate: 48000,
length: floats.length,
numberOfChannels: 1,
});
ab.getChannelData(0).set(floats);
console.log(ab.duration, ab.sampleRate);
const source = new AudioBufferSourceNode(oac, { buffer: ab });
source.connect(oac.destination);
source.start();
await new Promise((r) => (source.onended = r));
await decoder.flush();
return;
}
}
},
});
const encoded = await (await fetch('./encoded.json')).json();
let base_time = encoded[encoded.length - 1].timestamp;
console.assert(encoded.length > 0, encoded.length);
console.log(JSON.stringify(encoded, null, 2));
const metadata = encoded.shift();
console.log(encoded[encoded.length - 1].timestamp, base_time);
metadata.decoderConfig.description = new Uint8Array(
base64ToBytesArr(metadata.decoderConfig.description)
).buffer;
console.log(await AudioEncoder.isConfigSupported(metadata.decoderConfig));
decoder.configure(metadata.decoderConfig);
while (encoded.length) {
const chunk = encoded.shift();
chunk.data = new Uint8Array(base64ToBytesArr(chunk.data)).buffer;
const eac = new EncodedAudioChunk(chunk);
decoder.decode(eac);
}
}
verifying the AudioData
data and AudioBuffer
channel data are incompatible.
Is there a prototype?
No.
Describe the feature in more detail
Web Audio API AudioBuffer
<=> WebCodecs AudioData
Provide an algorithm and method to convert WebCodecs AudioData
to Web Audio API AudioBuffer
with option to set sample rate of the resulting object.
I used OfflineAudioContext
to resample the hard-coded 48000
sample rate and numberOfFrames
, 2568
for firs and 2880
for remainder of t AudioData
objects output by AudioDeocder
// For Opus, we try to encode 60ms, the maximum Opus buffer, for quality
// reasons.
constexpr int kOpusPreferredBufferDurationMs = 60;
// default preferred 48 kHz. If the input sample rate is anything else, we'll
// use 48 kHz.
something like
const TARGET_FRAME_SIZE = 220;
const TARGET_SAMPLE_RATE = 22050;
// ...
const config = {
numberOfChannels: 1,
sampleRate: 22050, // Chrome hardcodes to 48000
codec: 'opus',
bitrate: 16000,
};
encoder.configure(config);
const decoder = new AudioDecoder({
error(e) {
console.error(e);
},
async output(frame) {
++chunk_length;
const { duration, numberOfChannels, numberOfFrames, sampleRate } = frame;
const size = frame.allocationSize({ planeIndex: 0 });
const data = new ArrayBuffer(size);
frame.copyTo(data, { planeIndex: 0 });
const buffer = new AudioBuffer({
length: numberOfFrames,
numberOfChannels,
sampleRate,
});
buffer.getChannelData(0).set(new Float32Array(data));
// https://stackoverflow.com/a/27601521
const oac = new OfflineAudioContext(
buffer.numberOfChannels,
buffer.duration * TARGET_SAMPLE_RATE,
TARGET_SAMPLE_RATE
);
// Play it from the beginning.
const source = new AudioBufferSourceNode(oac, {
buffer,
});
oac.buffer = source;
source.connect(oac.destination);
source.start();
const ab = (await oac.startRendering()).getChannelData(0);
for (let i = 0; i < ab.length; i++) {
if (channelData.length === TARGET_FRAME_SIZE) {
const floats = new Float32Array(
channelData.splice(0, TARGET_FRAME_SIZE)
);
decoderController.enqueue(floats);
}
channelData.push(ab[i]);
}
if (chunk_length === len) {
if (channelData.length) {
const floats = new Float32Array(TARGET_FRAME_SIZE);
floats.set(channelData.splice(0, channelData.length));
decoderController.enqueue(floats);
decoderController.close();
decoderResolve();
}
}
},
});
The audio playback quality is sub-par when resampling from 48000 to 22050. What is the suggested procedure to produce quality audio without glitches, gaps, faster or slower rate frames when converting from WebCodecs AudioData
to AudioBuffer
for the purpose of breaking out of the hard-coded box of Chrome WebCodecs implementation?
The current design direction is to be able to create AudioBuffer
objects directly from typed arrays, and to allow AudioBuffer
to internally used more data types than f32. For now, authors need to create an AudioBuffer
of the same size, use AudioData.copyTo
to copy to an intermediate ArrayBuffer
, and then copy (with possible conversion) to the AudioBuffer
. This is wasteful and not ergonomic.
Another design direction is to be able to get the memory of an AudioData
, and directly construct an AudioBuffer
from this memory, skipping all copies (w3c/webcodecs#287).
There are several issues.
AudioData
atAudioDecoder.output
is absolutely dissimilar from the inputAudioData
atAudioEncoder.encode
- developers use codecs for compression, not for implementation restrictions. Might as well just useopusenc
directly or in WASM form is we cannot doopusenc --raw-rate 22050 input.wav output.opus
equivalent inAudioEncoder
configuration - the options are ignored by the implementation. I can use Native Messaging,fetch()
reliably,WebTransport
far less reliably, to input stream from browser and get STDOUT from native application.- Further resampling and converting to data with less length to accommodate
MediaStreamTrackGenerator
, where theAudioData
output from anOsciallatorNode
connected toMediaStreamAudioDestionaNode
processed withMediaStreamTrackProcessor
is also dissimilar to WebCodecsAudioDecoder.decode()
output atoutput
callback - with the same input. AudioWorklet
results in less glitches thanMediaStreamTrackGenerator
- when the input is processed in aWorker
thenWebAssembly.Memory.grow()
is used - because when multipleReadableStream
s are processing in parallel and piped through and to other streams on the same thread, one can take priority and result in glitches in initial playback until the input is completely read - however, the only way to get anAudioWorklet
instance is via an Ecmascript module, which limits usage due to CSP, andAudioWorklet
does not exposefetch()
orWebTransport
- thus, use single memory with ability to grow; WebAssembly collects garbage -callingMediaStreamTrackGenerator
stop()
can crash the tab.
In summary, there needs to be consistency between these burgeoning API's so that user-defined conversion is not necessary, those if the user decides to convert between AudioData
and AudioBuffer
"seamlessly"; WebCodecs has free reign to do whatever it wants - why would the decoder only output 48000 sample rate when I deliberately input 22050 sample rate, 1 channel to configuration? That is inviting user-defined conversion (issues).
I updated and tested the code using OfflineAudioContext
a few hundred more times and compared to creating a WAV file using data from AudioData.copyTo()
// https://github.com/higuma/wav-audio-encoder-js
class WavAudioEncoder {
constructor({ buffers, sampleRate, numberOfChannels }) {
Object.assign(this, {
buffers,
sampleRate,
numberOfChannels,
numberOfSamples: 0,
dataViews: [],
});
}
setString(view, offset, str) {
const len = str.length;
for (let i = 0; i < len; i++) {
view.setUint8(offset + i, str.charCodeAt(i));
}
}
async encode() {
const [{ length }] = this.buffers;
const data = new DataView(
new ArrayBuffer(length * this.numberOfChannels * 2)
);
let offset = 0;
for (let i = 0; i < length; i++) {
for (let ch = 0; ch < this.numberOfChannels; ch++) {
let x = this.buffers[ch][i] * 0x7fff;
data.setInt16(
offset,
x < 0 ? Math.max(x, -0x8000) : Math.min(x, 0x7fff),
true
);
offset += 2;
}
}
this.dataViews.push(data);
this.numberOfSamples += length;
const dataSize = this.numberOfChannels * this.numberOfSamples * 2;
const view = new DataView(new ArrayBuffer(44));
this.setString(view, 0, 'RIFF');
view.setUint32(4, 36 + dataSize, true);
this.setString(view, 8, 'WAVE');
this.setString(view, 12, 'fmt ');
view.setUint32(16, 16, true);
view.setUint16(20, 1, true);
view.setUint16(22, this.numberOfChannels, true);
view.setUint32(24, this.sampleRate, true);
view.setUint32(28, this.sampleRate * 4, true);
view.setUint16(32, this.numberOfChannels * 2, true);
view.setUint16(34, 16, true);
this.setString(view, 36, 'data');
view.setUint32(40, dataSize, true);
this.dataViews.unshift(view);
return new Blob(this.dataViews, { type: 'audio/wav' }).arrayBuffer();
}
}
// ...
const wav = new WavAudioEncoder({
sampleRate: 48000,
numberOfChannels: 1,
buffers: [new Float32Array(data)],
});
const ab = (await ac.decodeAudioData(await wav.encode())).getChannelData(0);
Glitches can occasionally occur in the beginning of the OfflineAudioContext
playback. No glitches occur creating WAV headers and prepending the headers to the data. Test and compare the differences for yourself https://guest271314.github.io/webcodecs/.
Are these the simplest approaches to resample the output from AudioDecoder.decode()
?
The important point is that it is only necessary to resample the data from AudioData
at AudioDecoder.output
becuase WebCodecs does not honor AudioEncoder
or AudioDecoder
configuration and resamples to 48000
, and outputs numberOfFrames
far greater than input numberOfFrames
which is inconsistent behaviour.
If there was consistency between WebCodecs AudioEncoder.output
and AudioDecoder.output
with regard to AudioData
there would be no need to resample with Web Audio API.
Two things:
-
This approach to resample segments of audio with an
OfflineAudioContext
cannot work. Non-naive audio resampling is a stateful operation, and creating a newOfflineAudioContext
each time doesn't allow keeping any state. Resampling using anOfflineAudioContext
only works if the entirety of the audio is resampled in one operation. -
Resampling to another rate is not in the scope of Web Codecs. Web Codecs is just about decoding and encoding, and resampling the audio to play it out is expected for now, since there is no resampler object in the Web Platform yet. Opus always works in 48kHz internally, and by default always decodes to 48kz, so this is what you see in Web Codecs. For other codecs, you'll see that the rate is (usually) the rate of the input stream.
The problem is resampling is necessary based on WebCodecs output.
All you need do is test the output of AudioDecoder
and try to pass that AudioData
directly to a MediaStreamTrackGenerator
. One of two outcomes currently exist without user-defined intervention:
- The rate of playback will be increased https://bugs.chromium.org/p/chromium/issues/detail?id=1184070
- The tab will crash https://bugs.chromium.org/p/chromium/issues/detail?id=1244416
I suspect you folks have not really testedAudioDecoder
=>MediaStreamTrackGenerator
.
I can do $ opusenc --raw-rate 22050 input.wav output.opus
and get the output I set. WebCodecs ignores the configuration, yet claims "flexibility". Since you are citing 48kHz as the inflexible default for WebCodecs implementation of 'opus' you need to update your specification to state that unambiguously so that I no longer will expect the option I pass to be effectual.
Resampling is necessary with the output of WebCodecs AudioDecoder
AudioData
to outerh API's - without using setTimeout()
and essentially guessing when the incompatible-with AudioData
will end.
I suggest you folks actually test AudioDecoder
=> MediaStreamTrackGenerator
, and stop claiming WebCodecs is "flexible" is you intend on restricting options available using opusenc
and opusdec
. I might as well just use opusenc
and opusdec
with fetch()
or WebTransport
.