alesaccoia/VoiceStreamAI

Feature Request: color coding confidence of tokens/words

Closed this issue · 2 comments

When the audio quality is poor, it's really helpful to have visual indicators for words with low confidence. This makes it easier to understand and interpret the transcription and possibly figure out what was actually said. Additionally, if you provide confidence scores and more context about the conversation to a language model, you might be able to adjust the text based on the context.

Currently, the default pipeline and whisper don't show the confidence level for each word. However, there have been some suggestions in this discussion openai/whisper#284 and this pull request openai/whisper#1119 that could be useful in adding this feature.

Hey Kirill, cheers for the links! Noticed the same thing about the HF pipeline not giving us confidence scores, but looks like we can sort it out with Torch. Right now, the project's just a rough first go, not much tech finesse in it. I'm planning to dive back into it early next year: it's super important to have a unit test that uses some solid ground truth, like a real-world test run – I'll figure something out for that. Once that's sorted, there's heaps of room for improvement and new features

Done!