Documentation for whisper inference
Closed this issue · 2 comments
I tried running inference(transformer like usage, because apparently llama.cpp like usage is not available for whisper) , installed intel_extension_for_transformers, but now it fails on
import neural_speed.whisper_cpp as cpp_model
ModuleNotFoundError: No module named 'neural_speed.whisper_cpp
I installed neural-speed in the way it is mentioned in docs, i.e.,
pip install -r requirements.txt
pip install .
and was successful in running phi-1.5 inference in llama.cpp way.
Please guide how to run whisper inference and like other models also add 3-bit inference support to whisper
You can use this pr and install neural_speed again. We don't currently support 3-bit inference and still in development
Thanks, that example worked, so closing the issue.
However, it starts using only 1 CPU core after first few seconds of inference (both with and without OMP_NUM_THREADS environment variable). Just informing you, however it is not a big issue for me, so closing the issue