microsoft/Olive

Whisper model does not work If you add a flag --enable_timestamps

DimQ1 opened this issue · 7 comments

Describe the bug
I updated the framework to the latest version and tried to convert a Whisper-tiny model with time prediction but same woth all models

To Reproduce
python prepare_whisper_configs.py --model_name openai/whisper-tiny --no_audio_decoder --multilingual --enable_timestamps
then
python -m olive.workflows.run --config whisper_gpu_int8.json --setup
then
python -m olive.workflows.run --config whisper_gpu_int8.json

then
python test_transcription.py --config whisper_gpu_int8.json --predict_timestamps
["<|0.00|> I don't think that's the only way to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to be able to"]

Expected behavior
["<|0.02|> The cut on his chest, still dripping blood. The ache of his over-strained eyes.<|5.20|><|5.92|> Even the soaring arena around him with thousands of spectators were trivialities not worth thinking about.<|12.68|>"]

Other information

  • OS: Windows
  • Olive version: main
  • ONNXRuntime package and version: onnxruntime-gpu: 1.17.0

Additional context
I think it happens becouse onnxruntime updated
logits_processor.h#L162C2-L162C79
previously it was harcoded
logits_processor.h#L160

I already made a PR to support the latest changes in ORT #1016 and it worked fine when I tested it.

Could you try again with the latest ORT 1.17.1 and latest code from olive main branch?

nevermind. I get the same output with the latest version of the packages. I think this might be because the quantized whisper-tiny model (both cpu and gpu) is not accurate enough to begin with.

Quantization introduces weight errors (quantized weights cannot represent the float weights completely and has inherent quantization error) in the model. For the tiny model, this error seems to be too big making the quantized models unusable with/without timestamps.

Ok it is clear regardin whisper-tiny model. But I have trouble with whisper-large-v3.
I've tried convert whisper-medium and it's works fine. But then I've converted whisper-large-v3 and it's didn't work properly
Here output for whisper-large-v3:
['<|0.00|> the cut on his chest still dripping blood the ache of his overstrained eyes even the soaring arena around him with the thousands of spectators were trivialities not worth thinking about<|18.36|>']
output for whisper-large-v2:
['<|0.00|> The cut on his chest is still dripping blood, the ache of his overstrained eyes, even the<|9.08|><|9.08|> soaring arena around him with the thousands of spectators were trivialities not worth<|16.60|><|16.60|> thinking about.<|17.72|>']
Do you have any idea why it is so?

@DimQ1 large-v3 doesn't support enabling timestamps

What the reason? I thought that it was because some parameters was harcoded but now it isn't.

@DimQ1 not sure about the reason, but I have found this in readme:

No action item from Olive team. Closing this issue. Please reopen this if you have more questions.