how to finetune whisper model with 'initial_prompt'

Question

how to finetune whisper model with 'initial_prompt'

v-yunbin opened this issue a month ago · 4 comments

when use 'initial_prompt', the decoding result of finetuning with my data on whisper model v2 is bad, on the contrary, the result is good.
however, when use 'initial_prompt' the decoding result of based whisper model v2 is also good, so it means If want to use 'initial_prompt' during decoding , must add it when training？

Answer 1 · 2024-05-06T09:15:14.000Z

Sorry, I don't understand your issue. Could you please explain it in more detail, what you want to achieve and how? Ideally show the code that leads to good or bad results.

Answer 2 · 2024-05-06T10:13:31.000Z

HI, Now, whisper can use context information to improve recognition accuracy:
And, if you want pass context information to whisper, you can use arg for cli:
https://github.com/openai/whisper/blob/main/whisper/transcribe.py#L531
parser.add_argument("--initial_prompt", type=str, default=None, help="optional text to provide as a prompt for the first window.")
when finetune the whisper model, not use "--initial_prompt", decoding result of finetuned model with using "--initial_prompt" will be worse.

Answer 3 · 2024-05-06T10:51:44.000Z

I see. I don't really have any expertise in whisper and how the initial prompt affects the outcome. But my best guess is that yes, if you want to use it, you should also use it during training, using the same logic as in the script that you linked.