tbd_audio_stack

COPYRIGHT(C) 2020 - Transportation, Bots, and Disability Lab - CMU
Code released under MIT.
Contact - Zhi - zhi.tan@ri.cmu.edu

A collection of ROS Packages that handles audio processing from capture to recognition (Utterance). The collection consist of the following packages:

tbd_audio_msgs

This repository consist of ROS Messages used throughout the collections

tbd_audio_capture

Currently this is a republish of audio signal from audio_capture with our own message (tbd_audio_msgs/AudioStamped) which encodes the same data but adds additional information about originating time in the header.

tbd_audio_vad

This package is a wrapper for WebRTCVADPy which conducts voice activity detection on the received stamped audio

tbd_audio_recognition_deepspeech

This package is a wrapper for Mozilla's open source implementation of DeepSpeech. It takes in both the VAD and Stamped audio and publishes a detected utterances.

tbd_amazon_transcribe

This package is a wrapper for Amazon's AWS Transcribe service. It takes in both the VAD and Stamped audio and publishes a detected utterances.

Quick 10-Step Setup Instructions

Install ROS Melodic.

Install these ROS dependencies:

sudo apt install ros-melodic-audio-common*
sudo apt install ros-melodic-audio-capture*

Install Python 3 dependencies:
```
sudo apt install python3-venv
```

Create a new ros workspace and python3 virtual environment.

mkdir catkin_ws && cd catkin_ws
python3 -m venv venv
source vevn/bin/activate

Install the following python3 dependencies into the virtual environment:
```
pip install webrtcvad deepspeech==0.7.4 rospkg empy alloylib
```
Create and navigate to the src directory.
```
mkdir src && cd src
```

Clone the tbd_audio_stack repo into src.

git clone https://github.com/CMU-TBD/tbd_audio_stack.git

Download the correct deepspeech model files.

cd src/tbd_audio_stack/tbd_audio_recognition_deepspeech && mkdir models && cd models
wget https://github.com/mozilla/DeepSpeech/releases/download/v0.7.4/deepspeech-0.7.4-models.pbmm
wget https://github.com/mozilla/DeepSpeech/releases/download/v0.7.4/deepspeech-0.7.4-models.scorer

Go back to the workspaces's root directory and build and run your project. Make sure to be in the python3 virtual environment.

cd ~/<path_to_your_workspace>/catkin_ws
catkin build -DPYTHON_VERSION=3
source devel/setup.bash
roslaunch tbd_audio_recognition_deepspeech run_recognition.launch

Every thing sould run correctly, and you should be able to see the text output by running rostopic echo /utterance and speaking into your computers microphone.