Decoding mp3/ogg/aac to fix Web Audio API .decodeAudioData() shortcomings?
juj opened this issue · 3 comments
In various conversations throughout the years, the Web Audio API working group has generally closed out feature requests to the API, and it has been the expectation that WebCodecs+AudioWorklets+SAB+Wasm would allow developers to write an audio decoding+mixing engine to fix on their own the shortcomings that Web Audio API currently has for audio playback parity with native apps ([1], [2], [3], [4], [5] etc.). Recalling, the major two shortcomings are:
- Especially in games, one must use uncompressed audio, because compressed audio does not support custom loop points or animating pitch, and seamless looping is not guaranteed with compressed audio files (works with varying success, with a number of open bug reports against browsers [1], [2], [3], [4], [5]).
- All this uncompressed audio takes a huge amount of memory, exaggerated by having to store audio samples in 32-bit float, 16-bit int is not supported (not to mention the large amount of time and CPU power it takes to fully decode up front)
The general result is that web sites using Web Audio API typically excessively overuse system memory and CPU power. That makes WebCodecs extremely appealing to adopt by a large number of users.
So sat down today to try to use WebCodecs to fix up .decodeAudioData()
that would patch up the above issues by decoding audio on-demand. I could not find any examples on how to use WebCodecs AudioDecoder API to decompress an audio file on the fly, so tried to build my own just based on the spec.
However I run into a few issues from the get-go. My small example looks like
var decoder = new AudioDecoder({
output: o => { console.log(o); }, // (3)
error: e => { console.error(e) }
});
var config = {
codec: 'mp3', // (1)
sampleRate: 44100, // (2)
numberOfChannels: 2 // (2)
};
decoder.configure(config);
var chunk = new EncodedAudioChunk({
type: 'key', // (2)
timestamp: 0, // (2)
// duration: 100, // (2)
data: compressedAudioAsArrayBuffer
});
decoder.decode(chunk); // (3)
which raises the following observations:
-
I need to specify the codec for the input file. For general ogg vorbis and mp3 that is possible to look from the file suffix (needs enforcing that all asset files come with the suffix identified, or some side-channel information, but that's passable), however for AAC the codec description strings seem to be more complex, there seems to be some kind of profile system in play?. How would I know which profile the input AAC file had? I'd like to just be able to say
mp4a
oraac
for an AAC-encoded file, and not have to specify the*
part formp4a.*
. (unless it is something trivial that can be statically reasoned without having to write a file format parser?) Ideally I'd just passcodec: 'autodetect'
(or omit it altogether) and have the system know what kind of audio I have passed. -
I don't know what to put in to the fields marked with (2). These fields are something I would want to just leave out, and have the codec tell me what the input file had. With the exception of the
duration
field, the other fields look like they are mandatory, and Chrome won't decode unless they are specified. Shouldn't the codec be able to know this info based on the input file that is provided to it? -
Ignoring (2) for now and hardcoding known values; When starting off
.decode()
, it will in my test call the output callback 1149 times to immediately decode the whole file to completion, just like.decodeAudioData()
did, hence winning nothing. I suppose it is doing what I asked for, since I provided the whole file as one chunk. What I instead want is to be able to tell the API to e.g. "decode 1 second forward from current position", or "decode 4096 new samples".
I presume I should be slicing up the input file to pass as the data
field to pace the decoder, But the issue is that I don't know what is the chunk size is that I should to pass to the encoder to achieve that 1 second or 4096 new samples that I want?
If I choose a too high value, I do excess work and cause excess memory usage.
So as a hack I try to choose an arbitrary small value, e.g. 16kB. My test reads like this:
function playAudioDecoderUncompressed(compressedAudioAsArrayBuffer) {
var decoder = new AudioDecoder({
output: o => {
console.log(o);
if (/*how to know when all samples from previous .decode() job have finished decoding?*/) {
decodeMore();
}
},
error: e => { console.error(e) }
});
var config = {
codec: 'mp3',
sampleRate: 44100,
numberOfChannels: 2
};
decoder.configure(config);
var chunkSize = 16*1024; // Hack: choose some small chunk size to avoid overdecoding
var chunkOffset = 0;
function decodeMore() {
var bytes = Math.min(chunkSize, compressedAudioAsArrayBuffer.byteLength - chunkOffset);
if (bytes <= 0) return;
var chunk = new EncodedAudioChunk({
type: chunkOffset == 0 ? 'key' : 'delta', // I don't really know if this is how this is supposed to work
timestamp: 0, // I don't know what to put here
// duration: 100, // I don't know if this should be specified
data: new Uint8Array(compressedAudioAsArrayBuffer, chunkOffset, bytes)
});
chunkOffset += bytes;
decoder.decode(chunk);
}
decodeMore();
}
This code decodes the first chunk and calls output
callback 29 times, and when I try to call decodeMore()
after that, I just get a generic DOMException: Decoding error.
no matter what I try. This is where I hit the wall.
There does not seem to exist an API to tell when all decoding is complete? The output
callback will get invoked a number of times as a result of calling .decode()
once, so I don't know how many times the output
callback will trigger before I should issue a new call to start decoding the next chunk. How should this be achieved?
Am I trying this right? I don't even know if the API is supposed to work like this? Or is there something I am missing?
Thanks for any help!
WebCodecs does not decode files, it decodes raw bitstreams. In the common case the bitstream data will be paired with metadata in a container (eg. mp4a is in an ISO BMFF aka MP4 container), and it is the job of the application to extract both.
How would I know which profile the input AAC file had?
If your file is an mp4a file, you will need to use an ISO BMFF parser (eg. mp4box.js) to extract this information. I recommend https://gpac.github.io/mp4box.js/test/filereader.html to get a feel for the sort of metadata that is in an mp4a file.
I don't know what to put in to the fields marked with (2)
This information is all in the ISO BMFF container metadata. I'm not certain offhand, but the number of channels may require parsing the AAC codec-specific data.
(Edit: channel count is in the mp4a
box. It shouldn't be necessary to parse the esds
in this case.)
What I instead want is to be able to tell the API to e.g. "decode 1 second forward from current position", or "decode 4096 new samples".
WebCodecs provides only codecs, not a player implementation. It is our hope that these sorts of APIs will be built on top of WebCodecs.
Apologies for the confusion.
With all the conversations that have happened in the past and the eagerness to close out Web Audio bugs in favor of WebCodecs, there has been a misunderstanding that it would directly address these use cases, but now it is clear that it does not. Thanks for the clarification.
@juj, to clarify, while WebCodecs does not included all of the components to address these use cases, it is intended that WebCodecs fills the "decoding" roll (while javascript fills the "demuxing" roll). Folks who presently build mixing engines on top of WebAudio should find this really useful, as it allows you to decode only what you need (vs the whole file) and just-in-time. Better flexibility, less memory.
Btw, we are working on a demo that decodes audio (and video) here
chcunningham/wc-talk#1