tp7/Sushi

Audio Support!

Closed this issue · 4 comments

I'll try to phrase this in a way that makes sense. Say you have a release from Japan and you want to maintain the original video (source tends to be darker) but you want to include the dub from NA. I would compare the Japanese tracks against each other to determine the offset required for the English dub to properly line up with the Japanese video source.

This is due to the fact that some regions tend to add random delays on some releases or chain link episodes in the same m2ts file for bluray thus making it time consuming to split them without offsetting the audio.

At the moment I am using this program on my subtitles to gauge the offset required and then shoving the value in eac3to to offset my audio to match the new video stream. This would be an amazing feature to add to Sushi!

Sushi outputs a list of shifts it finds to stdout, doesn't it? Then it's possible to make a shell/cmd script that parses the stdout via standard techniques of bash/cmd and then executes eac3to to cut the audio.

P.S. On second thought it seems like this shift log can't be used directly since the printed timecodes are apparently linked to the subtitle events. So an additional command line switch that prints just the audio shifts without relation to subtitles would solve the issue. I guess.

Looking at the output I could just pick a value from either debug's stack or info's average since they're all within one tenth of a millisecond.

I just got the script working for capturing the stdout, parsing it should be straight forward, thanks for the idea!

tp7 commented

The way Sushi chooses synchronization points will not work for audio.

Consider the following: you have 20 seconds of audio and two subtitle events, 00:00 - 00:05 and 00:15 - 00:20. Now imagine that after the processing you see that the first event is shifted 1 second forward and the last one is 10 seconds forward. This information alone is not enough to determine where exactly the audio should be split, it might be anywhere between 00:05 and 00:15.

Unfortunately Sushi is not able to provide you with the info you want and doing so would considerably complicate implementation, so I'm not planning to implement audio support in the near future. It might be reasonable to implement it as a separate script because most of Sushi's postprocessing is also meaningless for audio.

That information seems to be based on having disjoint segments and in that case I would agree with you. I'm speaking in terms of two audio streams that are identical except that one stream is offset by say 1 second (I can't stress identical enough and that the entire stream is offset from the start). I'm trying to figure out that initial offset between the streams. Generally I would count frames using StackHorizontal in AviSynth.

My theory is further backed by the debug output as every subtitle used in this test case (entire series of 25 episodes) is adjusted by constant value for every subtitle in that episode (minor deviations from time to time like 0.0000008333 but that's WAY too insignificant).

You might ask why I would want this information and that is because a specific release I have does not have FLAC but AC3 and I would rather use the FLAC and have it sync the already encoded video. Thus finding the offset between between the tracks is key. I understand that this might be beyond the scope of the project just thought I'd ask though :)