intel/neural-speed

Documentation for whisper inference

Closed this issue · 2 comments

I tried running inference(transformer like usage, because apparently llama.cpp like usage is not available for whisper) , installed intel_extension_for_transformers, but now it fails on

import neural_speed.whisper_cpp as cpp_model
ModuleNotFoundError: No module named 'neural_speed.whisper_cpp

I installed neural-speed in the way it is mentioned in docs, i.e.,

pip install -r requirements.txt
pip install .

and was successful in running phi-1.5 inference in llama.cpp way.

Please guide how to run whisper inference and like other models also add 3-bit inference support to whisper

You can use this pr and install neural_speed again. We don't currently support 3-bit inference and still in development

Thanks, that example worked, so closing the issue.
However, it starts using only 1 CPU core after first few seconds of inference (both with and without OMP_NUM_THREADS environment variable). Just informing you, however it is not a big issue for me, so closing the issue