curious-broccoli/speech_pipeline

Record speech with microphone, translate and synthesize translated speech (en->de, de->en)

Python

Speech pipeline

Linux python project to:

recognize human speech (German or English), from either a microphone or a video
then translate it to English or German
then convert it into speech (text-to-speech).

Installation

Tested on Ubuntu 22.04.1 LTS with Python 3.10.4 and pip 22.2.2

Clone and change to the repository and bash install.sh
Confirm the installation of the programs it needs
Activate the virtual environment source ~/venv_speech_pipeline/bin/activate

Models

All machine learning models will automatically be downloaded the first time they are needed:

Vosk models in ~/.cache/vosk/ (more than 1 GB each)
Marian models in working/git directory
TTS models in ~/.local/share/tts/

Usage

Run python3 process_speech {mic,video} --help for more information

From a video file

Run python3 process_speech.py video [file]

From a microphone

Run python3 process_speech.py mic