alopatindev/sync-audio-tracks

Incorrect offset

feanor3 opened this issue · 6 comments

I used your code to sync more than 300 audio tracks and it worked perfectly. Then I tried with 2 other tracks and offset computation is wrong: 0.88... , while comparing audio picks in kdenlive the delay is about 0.11.
Do you know what is the problem?
tracks

~0.88 looks like a correct offset to me: both bad.ogg and out.wav look perfectly synced in Audacity. What do you see in Audacity or other sound editor after adding bad.ogg and out.wav as tracks?
screenshot_00004051

I'm sorry for the bad title.
The previuos and following audio files were extracted from two different video files with ffmpeg -i input -map 0:a output.

This is my timeline (25fps) with bad, good and synced audio (ogg)
ogg
The delay from bad and good, zooming, is 3 frames so ~0.12s, the computed offset is ~0.88s. The synced track is out of sync to good.

This is the timeline with wav files.
wav
I tried extracting wav audio instead of ogg from video files and it works: the first track is bad.wav and now the offset from good is 22frames ~0.88s and the output track is in sync with good.

This is probably due to some bug in sox: for some reason it loses silence in the beginning after re-encoding shifted decompressed audio back to ogg/vorbis. Which sox version you're using?

Strangely I wasn't able to reproduce it even with .ogg extension as output. I have sox 14.4.2.

Anyway I recommend to always use .wav at least for good input and synced output, and then re-encode to whatever format you'd like, using something that was designed specifically for re-encoding, like ffmpeg (if you need re-encoding at all).

ok, thanks.
I'm using sox 14.4.2.
I'll use wav and then re-encode the audio.

Closing as hopefully it resolves the issue. Feel free to reopen if it doesn't.

The problem was probably another: I noticed the extracted audio from the video with ffmpeg was itself out of sync. In video's metadata there is duration and also start_duration which was ~0.99 and so the offsett resulted ~0.888.
I fixed extracting the audio, getting start_duration with ffprobe and using sox pad to add silence, so the audio was in sync with the video. Then I used your code and the offset was ~0.12, which is what I was seeing in kdenlive.