michaelmob/WebMCam

Audio and video are out of sync for longer duration shots

michaelmob opened this issue · 3 comments

Due to the nature of average FPS shifting and some frames not being shot at all, the video is a little shorter than the audio with some sections of video moving slower than others. The audio or video needs to be shrunken or expanded to match eachother. This can be done in FFmpeg but requires tweaking per recording.

Perhaps there are FFmpeg arguments that could automatically lengthen (or shrink) the video length to match audio duration.

For those that require spot-on audio to match their video, I would suggest to use different recording software.

A possible solution could be to get the real duration of the frames to convert (300 frames / 25 fps = 12 seconds) and get the difference. Using the difference, some sort of algorithm could be created to speed up the video using the video filter tag.

-filter:v "setpts=0.25*PTS"
where 0.25 would be replaced with the output of said algorithm.

From Linux goesZen:

... I found out that relying on the extracted audio length is a good solution as it appears that ffmpeg seldomly (at least it never happened to me) speeds up/ slows down audio - it's always played at the right rate, whereas the video often receives a speedup/slowdown.

... As my Blender project had a fps base of 25fps this naturally resulted in offset video. Audio never differs in length as the audio track doesn't know of framerates.

Coupled with your concerns on average FPS shifting and some frames not being shot at all, we can assume that the recorded .wav is always correct. Then we need to take the audio duration as base, and calculate the video factor.

// i'm typecasting as int, as i fear it might be difficult to handle floating points?
video_duration = (int)(number_of_frame / frames_per_second)
audio_duration = audio_file.get_wav_duration()

video_factor = audio_duration / video_duration

// examples
video_duration = 350
audio_duration = 300
video_factor = 0.8571 // 1 sec in audio becomes 0.8571 sec in video
// 350 * 0.8571 = 299.9850

video_duration = 300
audio_duration = 350
video_factor = 1.1667 // 1 sec in audio becomes 1.1667 sec in video
// 300 * 1.1667 = 350.1000

Of course, I am not a certified mathematician, as I only used the two cases in example to formalize the algorithm. Props to my friends, Henry and CK for giving me tips on the algorithm! Do not know C# in depth either, so could not send a pull request.

As a note, you might want to add this as something optional until all users have no issues.

edit: I see now you are using ffmpeg.exe and not libffmpeg. My advice may not apply

I solved this in my toy x264 wrapper by using a 1/1000 timebase (milliseconds) and timestamping the frames instead of assuming a fixed FPS.

So I expect frames on PTS 0, 33, 67, 100, 133, 167, 200, but if I have a slowdown then I might get 0, 33, 100, 133, etc., and the decoder / player will compensate for that.

(As opposed to a fixed 30 FPS which would have timebase 1/30 and PTS 0, 1, 2, 3, 4, etc.)

This works whether the video is fast, slow, or completely inconsistent. Since you guys are using ffmpeg, it is probably the same style of API as I used (x264 into FFmpeg). WebM is already Matroska-based, and this trick works for both MP4 and Matroska containers.

The only catch is, this might cause VLC to report the video as "1000 FPS", but it will still play fine because there is only 30 / 60 FPS in reality.

For ffmpeg.exe, you may be able to run a monotonic counter and just write the previous frame to disk twice when you detect that you've dropped a frame. It's a bit of a waste but it will keep the timing in line.