tazz4843/whisper-rs

Add Example for Streaming Transcription as it being processed

Closed this issue · 5 comments

Hello, I've been using whisper-rs for speech-to-text transcription in my flutter app using dart ffi.

While the existing examples provides code examples where an audio file is fully transcribed before the output is available, I didn't find an example that shows how to receive transcriptions incrementally as an audio file is being processed.

To clarify, I'm not referring to real-time transcription of live audio but rather the ability to get pieces of transcription as the package processes a pre-recorded audio file.

I went through the source code and found that the functionality is already implemented through params, but I could not piece them together into a working example.

whisper itself as a model isn't great for streaming since it chunks audio into 30 second chunks. You may be able to hack something together with FullParams::set_filter_logits_callback but it likely won't be pretty. If you really want this feature, it's likely best requested upstream.

See also #82 and #26

Ohh, I thought this functionality was already present in the Whisper.cpp as I saw some of the apps that use Whisper.cpp were streaming the transcript segment by segment.

By the way, is this not related to streaming - FullParams::set_single_segment

I just copied over any comments from whisper.cpp itself, so haven't tried anything. That looks like it may help, but as with everything else you'd be best off trying for yourself.

I'll experiment with FullParams::set_single_segment to see if it meets my requirements for incremental transcription.

Since this is more of an exploratory task on my end and not an issue with the package itself, I'll go ahead and close this issue. Thanks again for your guidance.

@haaris94
Hey, did you make any progress with that? I'm interested in adding streaming support from the mic / system audio in the app vibe