/background-transcription

Python script that uses a pre-trained audio model (trained with DeepSpeech) for background VAD (Voice Activity Detection), consequent transcription and execution of transcribed command on a Linux distribution.

Primary LanguagePythonMIT LicenseMIT

DeepSpeech - Background Transcription (pt-BR, Brazilian Portuguese)

Leia em Português.

Python script that uses a self pre-trained audio model (trained with DeepSpeech) for background VAD (Voice Activity Detection), transcription and transcribed command execution on Linux.

Description

This script utilizes a self trained audio model by DeepSpeech's Tensorflow machine learning implementation. Besides that, the script is supposed to run as a background process and record all real-time audio. When the VAD detects some sort of speech, it records everything until there is silence. This portion of audio is transcribed using DeepSpeech's CLI and the pre-trained audio model. The transcribed result is sanitized and searched in a dictionary (with confiability calculated by fuzzywuzzy) and if it matches something, this command is executed in a subprocess shell and the STDOUT output is shown in a simple GUI, made with eel.

This repository also includes a simple API to run DeepSpeech's transcription remotely (or on localhost), made by myself.

Installation

Before installing python modules, don't forget the prerequisites for the monster called pyaudio:

sudo apt-get install -y portaudio19-dev libjack-jackd2-dev

Proceeding, install python modules (highly recommend to do this in virtual environment, don't mess with python):

# Install python3.7 and virtualenv
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt-get install -y python3.7 python3.7-dev virtualenv

# Initialize virtualenv in a folder called deepspeech_app
virtualenv -p python3.7 deepspeech_app

# Activate virtualenv
cd deepspeech_app
source bin/activate

# Install python modules
python -m pip install --no-cache-dir -r requirements.txt

If you want to use the API, install Node.js (recommend you to install LTS version) and simply run on root project directory:

# Install all necessary packages
npm install server/

Usage

After installing everything, you should be able to either record real-time voice and translate that into commands or use an audio file as the recording (add -f option and the filename .wav):

# Command for real time translation (remove 'nohup' and '& exit' to run on terminal)
nohup python commands.py -m models/deepspeech_model_ptBR.pb & exit

If you want to use the API for transcription, run:

# Run node.js API on localhost or remote server
node server/server.js

# Run python script with API address
nohup python commands.py -m models/deepspeech_model_ptBR.pb --remote http://localhost:5000/services/transcribe & exit

Available commands (pt-BR, Brazilian Portuguese only)

Transcrição Comando Descrição
caminho pwd Show actual absolute path
criar aquivo "arquivo" touch "arquivo" Creates new file
criar pasta "pasta" mkdir "pasta" Creates new folder
disco df -h Show disk information
fechar exit Finishes program execution
limpar - Clean all UI text
listar ls List all files on current directory
localizar "arquivo" locate -i "arquivo" Locates a file in the system
processos ps aux List all processes in execution
tempo uptime Show for how much time the system is up
transcrever "frase" - Transcribes a phrase

Available programs

Program Command
chrome google-chrome
code code .
discord discord
firefox firefox
gedit gedit
gimp gimp
spotify spotify

GIT integration

Transcrição Comando
adicionar arquivos git add .
commitar "mensagem" git commit -m "mensagem"
empurrar git push origin main
estado git status
puxar git pull origin main

Collaborators

Turma de Projetos Transversais em Redes de Comunicação 2 da UNB no semestre de 2020/1.