shirayu/whispering

Remove multi language feature (Revert #20)

shirayu opened this issue · 1 comments

I read the whisper code and noticed that multilingual tokenizer is not supposed in Whisper.

When language is None, the tokenizer is not for all languages but for English (en) for "multilingual whisper models" (tiny, base, small, medium, large).

https://github.com/openai/whisper/blob/9e653bd0ea0f1e9493cb4939733e9de249493cfb/whisper/tokenizer.py#L295-L316

    if multilingual:
        tokenizer_name = "multilingual"
        task = task or "transcribe"
        language = language or "en"

Revert #20
Related to #21

Hi @shirayu ,

I see you have reverted my PR to fix #23. I understand that the tokenizer does not support a multilanguage mode, however, multilanguage transcription works fine on my end.
I think there is an error in the code you have edited, as language should be None and not opts.language .
Thanks for looking into this :)