/TranscribingAI

This repo consist of an AI model that makes use of OpenAI's Whisper model and Mistral-7b-openorca/seamlessM4T/Helsinki model to do transcribing and language translation through microphone inputs.

Primary LanguagePythonMIT LicenseMIT

Real Time Whisper Transcription

This is a real time speech to text with OpenAI's Whisper model. It works by constantly recording audio in a thread and concatenating the raw bytes over multiple recordings.

To install dependencies simply run

pip install -r requirements.txt

in an environment of your choosing.

Whisper also requires the command-line tool ffmpeg to be installed on your system, which is available from most package managers:

For non-english translation, this service uses either Mistral-7b-openorca (commented) or Meta's seamlessM4T 2023 (current) Please ensure that the model has been installed, refer to each model's respective links for installation requirements.

# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg

# on Arch Linux
sudo pacman -S ffmpeg

# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg

# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg

# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg

For more information on Whisper please see https://github.com/openai/whisper
For more information on Mistral-7b-openorca please see https://huggingface.co/Open-Orca/Mistral-7b-openorca
For more information on SeamlessM4T please see https://github.com/facebookresearch/seamless_communication
For more information on Helsinki-NLP please see https://huggingface.co/Helsinki-NLP/opus-mt-zh-en

SeamlessM4T Credits:
Title: "SeamlessM4T—Massively Multilingual & Multimodal Machine Translation"
Authors: Seamless Communication, Loïc Barrault, Yu-An Chung, Mariano Cora Meglioli, David Dale, Ning Dong, Paul-Ambroise Duquenne, Hady Elsahar, Hongyu Gong, Kevin Heffernan, John Hoffman, Christopher Klaiber, Pengwei Li, Daniel Licht, Jean Maillard, Alice Rakotoarison, Kaushik Ram Sadagopan, Guillaume Wenzek, Ethan Ye, Bapi Akula, Peng-Jen Chen, Naji El Hachem, Brian Ellis, Gabriel Mejia Gonzalez, Justin Haaheim, Prangthip Hansanti, Russ Howes, Bernie Huang, Min-Jae Hwang, Hirofumi Inaguma, Somya Jain, Elahe Kalbassi, Amanda Kallet, Ilia Kulikov, Janice Lam, Daniel Li, Xutai Ma, Ruslan Mavlyutov, Benjamin Peloquin, Mohamed Ramadan, Abinesh Ramakrishnan, Anna Sun, Kevin Tran, Tuan Tran, Igor Tufanov, Vish Vogeti, Carleigh Wood, Yilin Yang, Bokai Yu, Pierre Andrews, Can Balioglu, Marta R. Costa-jussà [^3], Onur Çelebi, Maha Elbayad, Cynthia Gao, Francisco Guzmán, Justine Kao, Ann Lee, Alexandre Mourachko, Juan Pino, Sravya Popuri, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, Paden Tomasello, Changhan Wang, Jeff Wang, Skyler Wang

Journal: ArXiv

Year: 2023

Helsinki Credits:
Title: OPUS-MT — Building open translation services for the World
Authors: Jörg Tiedemann and Santhosh Thottingal
Year: 2020
Address: Lisbon, Portugal

The code in this repository is public domain.