microsoft/Olive

Using Whisper for Chinese ASR in iOS may occasionally output illegal UTF-8 strings.

hasayakey opened this issue · 2 comments

Describe the bug
A clear and concise description of what the bug is.

I followed the document at https://github.com/microsoft/Olive/tree/main/examples/whisper using the following command to generate the Whisper model: python prepare_whisper_configs.py --model_name openai/whisper-tiny --no_audio_decoder --multilingual --enable_timestamps | olive run --config whisper_cpu_int8.json 2> /dev/null. Because using the CPUExecutionProvider on an iPhone causes the phone to overheat severely, I implemented the following strategy: I run an ORTSession every 2 seconds to get the transcribed text, and based on the timestamps in the returned text, I decide whether to discard the corresponding audio samples that have already been correctly transcribed. Most of the time, the text is output normally, but there are instances where the output of an illegal UTF8 string causes the onnxruntime-objc to crash.

crash stack microsoft/onnxruntime#21026

To Reproduce
Steps to reproduce the behavior.

Expected behavior
A clear and concise description of what you expected to happen.

Olive config
Add Olive configurations here.

Olive logs
Add logs here.

Other information

  • OS: iOS
  • Olive version: 0.7.0
  • ONNXRuntime package and version: onnxruntime-objc: 1.18.0

Additional context
Add any other context about the problem here.

Hi,

Thanks for creating the issue. Looks like you already opened a related issue in the onnxruntime repository which is a good place to ask since the model is generated using onnxruntime contrib operators. If the issue cannot be resolved from onnxruntime, the devs at https://github.com/microsoft/onnxruntime-extensions might have more insights since they created the post-processing parts of the model.

Hello,
I have encountered a similar issue while trying to use Olive Whisper to transcribe in Tajik Language. The resulting model from Olive performs far worse than a basic ONNX model and suffers from severe hallucinations. The Olive model also occasionally produces illegal UTF-8 strings, as you have mentioned. I have been unable to find an explanation or a fix for this.