zergon321/reisen

drift between video and audio stream in example player

Closed this issue · 2 comments

Another question after spending more time with the video player example.

When playing longer videos I notice a slight drift between the video and the audio stream over time. This becomes even more pronounced when I add some extra image processing. I think I understand what the issue is: Essentially the video frame processing and the audio playing take place on their own go routines without any synchronization between them. So I guess the video processing could fall behind, while the audio processing keeps up? I understand that the player is just a very simple example, so this behavior is probably not surprising. Nevertheless any recommendation how to synchronize audio and video better would be great.

Also, just for my better understanding, I noticed that media.ReadPacket() will return audio and video packets at random for all open streams. Most of the time video and audio packets will alternate perfectly although every once in a while I see 2 video packets for 1 audio packet. What is the rational here in terms of stream synchronization?

Hi.

As for media.ReadPacket(): yeah, all the packets of different streams are mixed up in the media file. This is necessary so you don't have to read the whole data of one stream before reaching the beginning of another stream. And no, there's no guarantee of persistent alternating. A series of packets of one stream can be preceded by any number of packets of another stream. It's because sometimes one packet is not enough to decode the whole video frame (source). So yeah, the only way to do media decoding is to read packets one by one and check the type of each packet and which stream it belongs to.

As for the drift: yes, I noticed that myself. The music usually starts a bit earlier than the visual part (you don't have to open a GUI window to play the music). Also video playback and audio playback are played by libraries indepent from each other. That's why I thought of introducing of some kind of delay to make the playbacks synchronized, but it's not a reliable way.

But what would be really useful for streams synchronizing is presentation timestamp. Presentation timestamp is a time offset since the start of the media file. It's the exact time moment when the decoded frame should be presented to the user. In Reisen, you can access it via PresentationOffset() method of frame of any type. But unfortunately, I don't know the most efficient way to use it. I thought of checking the current time moment with if, using time.AfterFunc() over and over again, etc. You will have to come up with a solution yourself.

"This becomes even more pronounced when I add some extra image processing", - well, It may happen because you put all your image rotating/flipping code in the game Update() method. If that's true, consider moving it to the ReadVideoAndAudio() function so all the buffered frames are already transformed (and yeah, video decoding is started a bit earlier than running the "game").

Or you can rewrite the example so all the decoding and video/audio playing is started only after the window is created (game.Start() starts decoding and audio playing, ebiten.Run(game) opens the window and starts video playing).

After debugging this further I found that the frame buffer was backing up the longer my video is playing. Basically this is due to heavy image processing I'm performing and really entirely my problem. My solution for now is simply improving my image processing algorithm to be fast enough to keep up with the audio stream. I also have some ideas on how to synchronize audio and video but I haven't implemented that yet. Just as a fun hack I've implemented an ASCII converter that can play videos via text interface. If you are curious my project is located here: https://github.com/boriwo/movart