Subtitles chunks created, but not shown in playback

Question

Subtitles chunks created, but not shown in playback

Closed this issue 2 months ago · 3 comments

Very cool project! I'm trying to get it up and running, and everything seems to be starting as it should, but there are very few requests towards the .vtt-endpoints. And I'm unable to see the subtitles.

I see a couple of requests to the .vtt-chunks when selecting the subtitle track in the player (VLC, Quicktime Player), but none of these vtt-chunks are displayed in the player.

Any thoughts on why that might be?

Reproduce by:
python main.py -u "https://cph-msl.akamaized.net/hls/live/2000341/test/master.m3u8"

Here are my server logs when starting a client connection in Quicktime and enabling subs:

192.168.1.143 - - [16/Jun/2024 18:14:20] "GET /chunklist.m3u8 HTTP/1.1" 200 -
192.168.1.143 - - [16/Jun/2024 18:14:20] "GET /segment_4_20240616_1718561628.ts HTTP/1.1" 200 -
192.168.1.143 - - [16/Jun/2024 18:14:23] "GET /segment_4_20240616_1718561634.ts HTTP/1.1" 200 -
192.168.1.143 - - [16/Jun/2024 18:14:26] "GET /chunklist.m3u8 HTTP/1.1" 200 -
192.168.1.143 - - [16/Jun/2024 18:14:26] "GET /segment_4_20240616_1718561640.ts HTTP/1.1" 200 -
192.168.1.143 - - [16/Jun/2024 18:14:27] "GET /subs.m3u8 HTTP/1.1" 200 -
192.168.1.143 - - [16/Jun/2024 18:14:27] "GET /segment_4_20240616_1718561628.vtt HTTP/1.1" 200 -
192.168.1.143 - - [16/Jun/2024 18:14:27] "GET /segment_4_20240616_1718561634.vtt HTTP/1.1" 200 -
192.168.1.143 - - [16/Jun/2024 18:14:27] "GET /segment_4_20240616_1718561640.vtt HTTP/1.1" 200 -
192.168.1.143 - - [16/Jun/2024 18:14:29] "GET /segment_4_20240616_1718561646.ts HTTP/1.1" 200 -
192.168.1.143 - - [16/Jun/2024 18:14:32] "GET /chunklist.m3u8 HTTP/1.1" 200 -
192.168.1.143 - - [16/Jun/2024 18:14:32] "GET /segment_4_20240616_1718561652.ts HTTP/1.1" 200 -
192.168.1.143 - - [16/Jun/2024 18:14:32] "GET /subs.m3u8 HTTP/1.1" 200 -
192.168.1.143 - - [16/Jun/2024 18:14:35] "GET /chunklist.m3u8 HTTP/1.1" 200 -
192.168.1.143 - - [16/Jun/2024 18:14:35] "GET /subs.m3u8 HTTP/1.1" 200 -
192.168.1.143 - - [16/Jun/2024 18:14:38] "GET /chunklist.m3u8 HTTP/1.1" 200 -
192.168.1.143 - - [16/Jun/2024 18:14:38] "GET /segment_4_20240616_1718561659.ts HTTP/1.1" 200 -
192.168.1.143 - - [16/Jun/2024 18:14:38] "GET /subs.m3u8 HTTP/1.1" 200 -
192.168.1.143 - - [16/Jun/2024 18:14:41] "GET /segment_4_20240616_1718561664.ts HTTP/1.1" 200 -
192.168.1.143 - - [16/Jun/2024 18:14:44] "GET /chunklist.m3u8 HTTP/1.1" 200 -
192.168.1.143 - - [16/Jun/2024 18:14:44] "GET /subs.m3u8 HTTP/1.1" 200 -
192.168.1.143 - - [16/Jun/2024 18:14:47] "GET /chunklist.m3u8 HTTP/1.1" 200 -
192.168.1.143 - - [16/Jun/2024 18:14:47] "GET /subs.m3u8 HTTP/1.1" 200 -
192.168.1.143 - - [16/Jun/2024 18:14:50] "GET /chunklist.m3u8 HTTP/1.1" 200 -
192.168.1.143 - - [16/Jun/2024 18:14:50] "GET /segment_4_20240616_1718561670.ts HTTP/1.1" 200 -
192.168.1.143 - - [16/Jun/2024 18:14:53] "GET /segment_4_20240616_1718561677.ts HTTP/1.1" 200 -```

Answer 1 · 2024-06-16T21:55:11.000Z

Hi Henrik, glad to see you've found some use in it!

That's quite odd indeed. I can't seem to replicate this against the HLS link provided under my own development environment (Win + CUDA on VLC 3.0.20). I did notice quite a few hallucinations from the model where it tried to fill in the silences, but it did seem to output subtitles for where there was speech.

There are a few things that may still be worth trying, though:

If your machine's CUDA-enabled, it may be worth passing --use-cuda=false to the script. This is CPU-only, so it'll be slower, but it should at least help narrow things down. I am not entirely sure if older cards work as they should under FP16 precision, as I've only tested this on an RTX4090 and/or on CPU myself.
Check the ffmpeg version on your environment. As above I've been unable to reproduce under ffmpeg 7.0, and I don't see a reason why older versions would misbehave given how limited its use is within the script, but it may be worth a shot. If the stream's PTS offset is different to the one generated by the script for its subtitles, it tends to manifest as what you're seeing (VTT/SRT files being generated, but no subtitles to be seen in the stream).
You may also try using the --hard-subs flag. That'll embed the subtitles into the stream directly, rather than generate VTT files separately. If it's an issue with the player's handling of WebVTT, that should reveal it.
Debugging-wise, it may be useful to try and pull some of the WebVTT files that it's requesting from the local endpoint to see what the script outputs, if anything, for the requested segments.

Answer 2 · 2024-06-18T08:54:17.000Z

Hi Psychotropos, and thanks for getting back to me so quickly! The issues went away when I changed to using my own m3u8 endpoint, and now things are working well.

A follow up question:
Have you considered feeding the Whisper process with concatenated audio segments in order to increase the context for the transcription? I assume that would give a better result in terms of quality of transcription. It would require some work to split the transcribed text back into the right chunks, but maybe it's worth a try?

Answer 3 · 2024-07-10T15:24:48.000Z

Hi Henrik,

It's a good suggestion, and something I considered myself. I'll do some further investigation on it when I find the time to.
I'm closing this issue for now given the original issue's been dealt with, but do feel free to raise additional issues and/or PRs if you think of anything else. Thanks again.