antiboredom/videogrep

Automatically refine word-level alignments from sentence-level alignments

ryanfb opened this issue · 2 comments

First of all, thanks so much for all your work on this and making it open source! It would be cool if it were possible to do a fragment search using an existing SRT transcription without having to re-transcribe all of the audio in advance. One way to do this would be to use the existing sentence-level alignments to extract the audio ranges for sentences that match a search, then use vosk to transcribe just those audio ranges, then use the results of those transcriptions to extract the fragment-level audio.

That's an interesting idea - I'd definitely be open to experimenting with it... Alignment might also work here. alphacep/vosk-api#756