/whisper_ros

Speech-to-Text based on SileroVAD + whisper.cpp (GGML Whisper) for ROS 2

Primary LanguageC++MIT LicenseMIT

whisper_ros

This repository provides a set of ROS 2 packages to integrate whisper.cpp into ROS 2 using audio_common 4.0.2. Besides, silero-vad is used to perform VAD (Voice Activity Detection).

License: MIT GitHub release Code Size Last Commit GitHub issues GitHub pull requests Contributors Python Formatter Check C++ Formatter Check

ROS 2 Distro Branch Build status Docker Image Documentation
Humble main Humble Build Docker Image Doxygen Deployment

Table of Contents

  1. Table of Contents
  2. Related Projects
  3. Installation
  4. Docker
  5. Usage
  6. Demos

Related Projects

  • chatbot_ros → This chatbot, integrated into ROS 2, uses whisper_ros, to listen to people speech; and llama_ros, to generate responses. The chatbot is controlled by a state machine created with YASMIN.

Installation

To run whisper_ros with CUDA, first, you must install the CUDA Toolkit.

$ cd ~/ros2_ws/src
$ git clone https://github.com/mgonzs13/audio_common.git
$ git clone https://github.com/mgonzs13/whisper_ros.git
$ pip3 install -r whisper_ros/requirements.txt
$ cd ~/ros2_ws
$ rosdep install --from-paths src --ignore-src -r -y
$ colcon build --cmake-args -DGGML_CUDA=ON # add this for CUDA

Docker

Build the whisper_ros docker. Additionally, you can choose to build whisper_ros with CUDA (USE_CUDA) and choose the CUDA version (CUDA_VERSION). Remember that you have to use DOCKER_BUILDKIT=0 to compile whisper_ros with CUDA when building the image.

$ DOCKER_BUILDKIT=0 docker build -t whisper_ros --build-arg USE_CUDA=1 --build-arg CUDA_VERSION=12-6 .

Run the docker container. If you want to use CUDA, you have to install the NVIDIA Container Tollkit and add --gpus all.

$ docker run -it --rm --gpus all whisper_ros

Usage

Run Silero for VAD and Whisper for STT:

$ ros2 launch whisper_bringup whisper.launch.py

Demos

Send a goal action to listen:

$ ros2 action send_goal /whisper/listen whisper_msgs/action/STT "{}"

Or try the example of a whisper client:

$ ros2 run whisper_demos whisper_demo_node