Read audio from video directly, without extracting

Question

Read audio from video directly, without extracting

skorokithakis opened this issue 4 years ago · 2 comments

Currently, ALASS extracts the audio from the video so it can process it. This takes most of the time currently. If ALASS could read the audio directly from the video, it would presumably be at least twice as fast.

I imagine there are some video libraries that can be used to do this, but I don't know of any.

Answer 1 · 2021-05-02T20:02:43.000Z

This is more complicated than it seems. "Extracting" might be a misnomer here. This "extraction" step is actually "reading from video/audio file AND decompressing the compressed MP3/AAC/Vorbis stream into raw mono-channel 8khz". This uncompressed mono-channel 8khz audio is needed for the voice-activity detection module. The actual bytes in the video/audio file can not be used directly, since they are not in this specific format.

Currently this reading, decompressing and converting is done by invoking (the highly optimized!) ffmpeg, which conveniently also supports practically all container and audio formats. The invokation is done by spawning ffmpeg as a sub-process and communicating directly via STDOUT. So no audio file is actually extracted/written to anywhere on the disk. It is also possible to link against ffmpeg and use it as a library. I have already done this (see the README section), but there are some legal issues with this as well as being slightly more effort to do it correctly. It also makes the compilation of the project much more cumbersome. As said in the README, this is a little bit faster but not by much.

If there is another way to do the reading, decompressing and conversion and save time, feel free to reopen the issue!

Answer 2 · 2021-05-02T20:21:33.000Z

Ah okay, that makes sense, thanks!