This project aims to provide a Whisper integration to Rhasspy. Currently, it is somewhat functional, can work on both CPU and GPU with the provided Dockerfile.
Whisper is required follow the installation instructions here
pip install -r requirements.txt
python main.py
Start the container
docker run -p 4444:4444 tiemajor/whisper-rhasspy-http:latest
OR start the container with GPU support (requires nvidia-container-runtime, more info here)
docker run --gpus all -p 4444:4444 tiemajor/whisper-rhasspy-http:latest
On Rhasspy change your Speech To Text method to Remote HTTP
and set the Speech to Text URL to http://[IP]:[PORT]/api/speech-to-text
. Take care of replacing IP and PORT with actual values.
Arguments :
--host
: Set the bind address of the HTTP server. Defaults to0.0.0.0
--port
: Set the bind port of the HTTP server. Defaults to4444
--filter-chars
: Provide a list of characters to be filtered out of the recognized text. Defaults to None--whisper-model
: Define what model should be used for Whisper possible values aretiny
,base
,small
,medium
,large
; More info here. Defaults tobase
You can pass arguments to your docker run command in the same as you would usually.
Some intent recognition services might have troubles recognizing intents because Whisper adds punctuation to the recognized text.
For example What time is it ?
might not be recognized while What time is it
will be. The --filter-chars
argument is meant to be used in this case.
You can specify a list of characters that will automatically get filtered out. A good example would be --filter-chars ".?'"\!\"":;<>[]{}()"
Build the docker image
docker build . -t whisper-rhasspy-http:latest