No results found in vtt file
Closed this issue · 7 comments
Problem Description
When using videogrep with a .vtt
subtitles file, searches return no results for words which should be found.
How to Reproduce Issue
- Download this YouTube video with
youtube-dl
oryt-dlp
. - Download the subtitles file, for example:
youtube-dl https://www.youtube.com/watch\?v\=UygnbjIg8CY --skip-download -o toilet --write-auto-sub --sub-lang en --sub-format vtt
- Ensure the video is named
toilet.mp4
and the subtitles file is namedtoilet.vtt
. - Run a search for the word "toilet" on the video, which appears in the subtitles multiple times:
videogrep --input toilet.mp4 --search 'toilet'
- You will see a search result as follows:
No results found for toilet
Discussion
I noticed that each instance of the word toilet
is enclosed in <c>
tags and has a post-fixed space, e.g. <c>toilet <c>
. I tried using a regex like '\s*toilet\s*' and a number of other regexes, but I had no luck.
I think this should be yielding results, but I am likely doing something wrong. I have followed this tutorial as well as the --help
documentation included with videogrep to no avail. Please help me figure this out. I have tried multiple videos with auto-generated subs and even manually created ones, and none work, even with ultra-simple search terms like "the", even when using --search-type fragment
.
Note: transcription with vosk is working as expected and yields amazing results. However, I would like to understand why I can't get videogrep to work with the default .vtt
files included with the video.
I just came here because I am having the exact, to the letter, experience and issue. It's like something changed or...? Not sure but yeah I will make .vtt's (through Whisper now) and --search-type fragment will always turn back 'no results.' I simply can't use it. Maybe @antiboredom could chime in tell us we really are just doing something wrong, ha.
Hi @jet3004 and @bxbrenden - just in case you are still having issues here, I'd suggest moving from youtube-dl to yt-dlp. I've been using yt-dlp exclusively for a bit now and can confirm it works...
@antiboredom Hi Sam – didn't want to open a new 'issue,' as this is exactly what I am still experiencing. I'm not using YouTube downloads, so youtube-dl to yt-dlp don't matter much to me, but I am still finding, exactly as before, .vtt or .srt generated by Whisper shows 'no results' when I use --search-type fragment, etc. The Whisper models have improved tremendously in both accuracy and speed since my comment in April and I'd love to use its transcriptions with Videogrep...but it simply doesn't work. Any ideas?
hi @jet3004 unfortunately you can't use --search-type fragment
with most .vtt and .srt files. The fragment search requires that the subtitle file has word-level timestamps. srt files typically don't have these. Some .vtt files do (like most of the ones generated by youtube), but many do not...
@antiboredom Ah, thanks, I guess my confusion was that I thought the .vtt's generated by Whisper were word-level timestamped...But I guess I don't even see that as an option with them. Wish there was a way. Appreciate it.
@jet3004 there definitely is a way -- if you can export from whisper as a json file that follows the same format that I'm using for the vosk transcriptions... I'd love to add whisper to videogrep at some point...
@antiboredom Ha, for sure, more meant I wish Whisper outputted .json...will look into some more things! Thank you.