
Incorrect offset

feanor3 opened this issue · 6 comments

I used your code to sync more than 300 audio tracks and it worked perfectly. Then I tried with 2 other tracks and offset computation is wrong: 0.88... , while comparing audio picks in kdenlive the delay is about 0.11.
Do you know what is the problem?

~0.88 looks like a correct offset to me: both bad.ogg and out.wav look perfectly synced in Audacity. What do you see in Audacity or other sound editor after adding bad.ogg and out.wav as tracks?

I'm sorry for the bad title.
The previuos and following audio files were extracted from two different video files with ffmpeg -i input -map 0:a output.

This is my timeline (25fps) with bad, good and synced audio (ogg)
The delay from bad and good, zooming, is 3 frames so ~0.12s, the computed offset is ~0.88s. The synced track is out of sync to good.

This is the timeline with wav files.
I tried extracting wav audio instead of ogg from video files and it works: the first track is bad.wav and now the offset from good is 22frames ~0.88s and the output track is in sync with good.

This is probably due to some bug in sox: for some reason it loses silence in the beginning after re-encoding shifted decompressed audio back to ogg/vorbis. Which sox version you're using?

Strangely I wasn't able to reproduce it even with .ogg extension as output. I have sox 14.4.2.

Anyway I recommend to always use .wav at least for good input and synced output, and then re-encode to whatever format you'd like, using something that was designed specifically for re-encoding, like ffmpeg (if you need re-encoding at all).

ok, thanks.
I'm using sox 14.4.2.
I'll use wav and then re-encode the audio.

Closing as hopefully it resolves the issue. Feel free to reopen if it doesn't.

The problem was probably another: I noticed the extracted audio from the video with ffmpeg was itself out of sync. In video's metadata there is duration and also start_duration which was ~0.99 and so the offsett resulted ~0.888.
I fixed extracting the audio, getting start_duration with ffprobe and using sox pad to add silence, so the audio was in sync with the video. Then I used your code and the offset was ~0.12, which is what I was seeing in kdenlive.