Documentation for whisper inference

Question

Documentation for whisper inference

Closed this issue 9 months ago · 2 comments

I tried running inference(transformer like usage, because apparently llama.cpp like usage is not available for whisper) , installed intel_extension_for_transformers, but now it fails on

import neural_speed.whisper_cpp as cpp_model
ModuleNotFoundError: No module named 'neural_speed.whisper_cpp

I installed neural-speed in the way it is mentioned in docs, i.e.,

pip install -r requirements.txt
pip install .

and was successful in running phi-1.5 inference in llama.cpp way.

Please guide how to run whisper inference and like other models also add 3-bit inference support to whisper

Answer 1 · 2024-02-04T06:48:46.000Z

You can use this pr and install neural_speed again. We don't currently support 3-bit inference and still in development

Answer 2 · 2024-02-04T12:23:27.000Z

Thanks, that example worked, so closing the issue.
However, it starts using only 1 CPU core after first few seconds of inference (both with and without OMP_NUM_THREADS environment variable). Just informing you, however it is not a big issue for me, so closing the issue