absadiki/pywhispercpp

[Feature Request] Just Language Detection

Closed this issue · 3 comments

Thanks a lot for making these bindings. I've found it quite useful as it's quite fast and cross-platform.

I have a feature request:

So in a lot of cases it's possible to quickly eliminate a lot of hallucinations by running the auto_detection of language and validating if the probability of the detected language is sufficiently high. It is also useful to detect the language in the beginning so that you can ensure that you are using the correct model before inference.

So if it is right now the Model object provides a transcribe method but as you can see in this particular commit from whisper.cpp ggml-org/whisper.cpp#853 you can see that there is another parameter in whisper.cpp called detect language which allows us to run only the language detection pipeline. So exposing the method pwcpp.whisper_lang_auto_detect with a better Pythonic api to provide a way to access the probability numbers of the specific detected language would be quite helpful.

>>> segments = model.transcribe("file.mp3", language="")
whisper_full_with_state: auto-detected language: en (p = 0.983850)

Right now it prints out the details in logs but I don't see a way to access it!

You are welcome, @autolyticus. I'm glad you found it useful.

Yes, I agree, it would be a useful feature to run only the language detection pipeline or to get the logits without running the whole transcription.
I believe the whisper_lang_auto_detect is already exposed in the bindings, but you're right: I think there is no Pythonic API to use it.

It shouldn't be that hard to implement, though. Let me check what I can do.

Here you go, you can get the auto-detected language and probabilities as follows:

model = Model(...
detected_language, probs = model.auto_detect_language("path/to/media/file")
print(detected_language)

Please pull the latest commits and give it a try.
let me know if you find any issues ?

Hi, thanks for your quick response. I'm able to get the language info now, which really helps with reducing transcription errors :)

Thanks a tonne!