/speech_pipeline

Record speech with microphone, translate and synthesize translated speech (en->de, de->en)

Primary LanguagePython

Speech pipeline

Linux python project to:

  • recognize human speech (German or English), from either a microphone or a video
  • then translate it to English or German
  • then convert it into speech (text-to-speech).

Installation

Tested on Ubuntu 22.04.1 LTS with Python 3.10.4 and pip 22.2.2

  • Clone and change to the repository and bash install.sh
  • Confirm the installation of the programs it needs
  • Activate the virtual environment source ~/venv_speech_pipeline/bin/activate

Models

All machine learning models will automatically be downloaded the first time they are needed:

  • Vosk models in ~/.cache/vosk/ (more than 1 GB each)
  • Marian models in working/git directory
  • TTS models in ~/.local/share/tts/

Usage

Run python3 process_speech {mic,video} --help for more information

From a video file

Run python3 process_speech.py video [file]

From a microphone

Run python3 process_speech.py mic