New feature suggestion: resync/resample audio when packaging DV files in order to correct audio/video desynchronization due to unlocked DV audio

Question

New feature suggestion: resync/resample audio when packaging DV files in order to correct audio/video desynchronization due to unlocked DV audio

Opened this issue 4 months ago · 0 comments

The package utility in DVRescue currently demuxes the audio from the DV data into separate PCM audio streams stored in the output container - similar to e.g. a type 2 AVI file. This is presumably done for the convenience of other software that might wish to work with / play back the audio.

Unfortunately, this audio is not always well-synchronized with the video. I believe the main problem is due to DV video that was recorded with consumer cameras that did not have locked audio (as discussed by Adam Wilt). For example, in one of my DV files, there is an extra audio sample every several video frames. This eventually causes a few dozen milliseconds of desynchronization between video and audio. Presumably this is due to the audio clock in the camcorder running a little faster than the video clock. This issue seems to have been a recurring complaint on Internet mailing lists, forums, etc. over the past couple decades.

It might be useful for the package utility in DVRescue to try to correct these problems. I have had excellent results prototyping an approach. The key seems to be to repeatedly ask FFmpeg to decode audio samples from only a few video frames at a time from a temporary memory buffer. This is analogous to seeking to the middle of the physical videotape and starting playback from that point - thus, obviously, removing any audio drift that may exist prior to that location on the videotape. The resulting audio samples are then resampled so that an expected & precise number of audio samples remain. If the number of extra/missing audio samples is very low, then they are simply truncated, or silence is inserted, without changing any other audio samples. Larger errors can be corrected using FFmpeg's resampler.

Here is a link to my tool: https://github.com/JohnstonJ/video-tools/blob/main/src/video_tools/dv_resample_audio.py (documentation here) ... it is MIT license, so please feel free to adopt any parts of it. I would imagine that you'd only use this as inspiration to rewrite a better/faster tool in DVRescue from scratch, but you're also welcome to reuse the actual code, in whole or in part.

Here's an example of how this tool solves the problem for a DV file I have that was recorded by a consumer camcorder:

The DV file has an NTSC frame rate of 29.970 (30000 / 1001), and 32.000 kHz audio. (NOTE: this is the only frame / sample rate I have tested the script with, but it should work for others.)
The DV file is divided up into batches of 15 video frames, which corresponds to exactly / ideally 16,016 audio samples. A lower batch size is not used to avoid unnecessary resampling due to non-integer audio samples in each individual video frame.
Every frame in the DV file uses precisely 120,000 bytes. Therefore, we simply copy 15 * 120,000 bytes from the appropriate file offset into a separate memory buffer.
FFmpeg reads this memory buffer of 15 frames. Ideally, we will get 16,016 audio samples as output. Maybe this would be the case if the camera had locked audio. However, in my case, I sometimes see 16,017 audio samples instead, indicating that the audio clock was running a little fast in the camera.
When 16.017 samples are encountered, we simply truncate the extra one, so that 16,016 audio samples remain. (If there was only 16,015 samples encountered, then we'd insert a silent sample. Or, if many more were wrong, then we'd run the FFmpeg resampler.)
The resulting resampled audio is written to a separate output file.
Afterwards, outside of the tool, I use FFmpeg to mux the resampled audio file back into the MKV file that DVRescue packaged for me. (I also use FLAC to save space...)

This shows the output of an MKV written by dvpackager, viewed in VirtualDub2. The frame highlighted in blue is the first frame of a new scene that has significantly higher audio levels. Notice that the audio is delayed by almost an entire frame because the camera's audio clock was running fast:

And here is the same frame highlighted after running it through my new dv_resample_audio script. Notice that the high audio levels start at precisely the start of the video frame, meaning that the audio/video drift has been completely cured:

(NOTE: some other tools have dealt with this problem by simply resampling the entire audio stream. For example, the abandoned Kino project added a feature that did this on the entire audio stream at once (I checked the code changes they made). While this might work reasonably well for many cases where the clock drift was constant, it's not as precise as going frame-by-frame as my code does. (Keep in mind the clock drift could vary with changing camera temperatures or other conditions.))