ROS node for speech to text using VOSK on a docker container
Works with ROS and python 2.7. Tested with ROS melodic.
This package provides real-time and offline speech-to-text using VOSK.
You should install VOSK in its websocket server version using docker. A configuration file for installing the docker container is provided.
This package provide a speech to text module that is aimed to be fast and working locally on the robot, to bypass network delays that could jeopardize human robot ineteraction. IT relies on a relatively lightweight speech to text library, VOSK, which supports multiple languages. The payoff is in the fact that accuracy of the recognized speech could be lower wrt other alternatives (e.g., Google Cloud Services). The advantage is that you can have a speech to text module always active on the robot, and have faster results.
This is a first draft of the module that works as a simple topic publisher (so no action server nor ROS service are provided)..In the following months (hopefully) expect:
- this code to be refined and improved
- The possibility to ask the Vosk SST service to act as a ROS Action Server and not as a ROS topic.
If you want to contribute, feel free to do so!
Ros1 does not support publishing non-ascii characters over topics. So if your language contains utf-8 characters, you have two options: remove them or encode them as ascii character (and decode them on reading).
Both option are supported (change variable "encoding" to "ascii" to remove them, and to "utf-8" to encode them in ascii). Here you can find further references.
- Start the docker container
- Start
vosk_sst_publiser.py
- You should see three topics being published. All three of them publish String messages. The topics are.
1.
vosk/speech
publishes sentences as they are recognized in their entirety. If you are intereted in full sentences, use this topic. 2.vosk/partial_speech
publishes partial partial results from speech in real time as they are translated into text. If you need more accuracy, use thevosk/speech
results. If want to get a prompter results (e.g., getting a "no" or "yes" utterance and you are not interested in the full sentence, use this topic. 3.vosk/confidence
publishes, for each word published in avosk/speech
channel the confidence for each world.
docker
ROS
You need python 2.7. It may work on python3 but no tests have been done. Besides that, you need the following libraries (to be installed either with pip or from apt)
websockets
rospy
threading
pyaudio
- Install docker
- (opt) Installa a management tool for docker containers (e.g., dockstation)
- Install the container by using the Dockerfile provided in the repo. Here you can find some documentation about that.
Note that the Dockerfile is configured to use a VOSK model trained in Italian language to recognize it. If you need another language or you want to change the model, please edit the Dockerfile. To do so do the following:
docker build -f /path/to/a/Dockerfile .
- Start the Container