vosk_speech

ROS node for speech to text using VOSK on a docker container

Works with ROS and python 2.7. Tested with ROS melodic.

This package provides real-time and offline speech-to-text using VOSK.

You should install VOSK in its websocket server version using docker. A configuration file for installing the docker container is provided.

Description

This package provide a speech to text module that is aimed to be fast and working locally on the robot, to bypass network delays that could jeopardize human robot ineteraction. IT relies on a relatively lightweight speech to text library, VOSK, which supports multiple languages. The payoff is in the fact that accuracy of the recognized speech could be lower wrt other alternatives (e.g., Google Cloud Services). The advantage is that you can have a speech to text module always active on the robot, and have faster results.

Important Notes

This is a first draft of the module that works as a simple topic publisher (so no action server nor ROS service are provided)..In the following months (hopefully) expect:

this code to be refined and improved
The possibility to ask the Vosk SST service to act as a ROS Action Server and not as a ROS topic.

If you want to contribute, feel free to do so!

UTF-8 Characters

Ros1 does not support publishing non-ascii characters over topics. So if your language contains utf-8 characters, you have two options: remove them or encode them as ascii character (and decode them on reading).

Both option are supported (change variable "encoding" to "ascii" to remove them, and to "utf-8" to encode them in ascii). Here you can find further references.

Usage

Start the docker container
Start

vosk_sst_publiser.py

You should see three topics being published. All three of them publish String messages. The topics are. 1.vosk/speech publishes sentences as they are recognized in their entirety. If you are intereted in full sentences, use this topic. 2.vosk/partial_speech publishes partial partial results from speech in real time as they are translated into text. If you need more accuracy, use the vosk/speech results. If want to get a prompter results (e.g., getting a "no" or "yes" utterance and you are not interested in the full sentence, use this topic. 3.vosk/confidence publishes, for each word published in a vosk/speech channel the confidence for each world.

Dependencies

Main dependencies

docker
ROS

Python dependencies

You need python 2.7. It may work on python3 but no tests have been done. Besides that, you need the following libraries (to be installed either with pip or from apt)

websockets
rospy
threading
pyaudio

Docker container setup

Install docker
(opt) Installa a management tool for docker containers (e.g., dockstation)
Install the container by using the Dockerfile provided in the repo. Here you can find some documentation about that.

Note that the Dockerfile is configured to use a VOSK model trained in Italian language to recognize it. If you need another language or you want to change the model, please edit the Dockerfile. To do so do the following:

docker build -f /path/to/a/Dockerfile .

Start the Container

aislabunimi/vosk_speech