speech-to-text

There are 3154 repositories under speech-to-text topic.

ggerganov/whisper.cpp
Port of OpenAI's Whisper model in C/C++
Language:C++35.8k 311 1.4k3.6k
mozilla/DeepSpeech
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Language:C++25.4k 671 2.1k4k
leon-ai/leon
🧠 Leon is your open-source personal assistant.
Language:TypeScript15.5k 259 2991.3k
kaldi-asr/kaldi
kaldi-asr/kaldi is the official location of the Kaldi project.
Language:Shell14.3k 692 1.7k5.3k
SYSTRAN/faster-whisper
Faster Whisper transcription with CTranslate2
Language:Python12.6k 125 7441.1k
m-bain/whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Language:Python12.6k 139 7171.3k
jianchang512/pyvideotrans
Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言，同时支持语音识别转录、语音合成、字幕翻译。
Language:Python10.8k 67 5891.2k
speechbrain/speechbrain
A PyTorch-based Speech Toolkit
Language:Python9k 136 1.1k1.4k
Uberi/speech_recognition
Speech recognition module for Python, supporting several engines and APIs, online and offline.
Language:Python8.4k 277 6102.4k
alphacep/vosk-api
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Language:Jupyter Notebook8.1k 119 1.5k1.1k
nl8590687/ASRT_SpeechRecognition
A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统
Language:Python7.9k 183 2901.9k
TalAter/annyang
💬 Speech recognition for your site
Language:JavaScript6.6k 240 3431k
snakers4/silero-models
Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
Language:Jupyter Notebook5k 86 131315
sanchit-gandhi/whisper-jax
JAX implementation of OpenAI's Whisper model for up to 70x speed-up on TPU.
Language:Jupyter Notebook4.4k 43 181385
MahmoudAshraf97/whisper-diarization
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Language:Jupyter Notebook3.7k 49 203329
modelscope/FunClip
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
Language:Python3.7k 37 91406
k2-fsa/sherpa-onnx
Speech-to-text, text-to-speech, speaker diarization, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter, Object Pascal, Lazarus, Rust
Language:C++3.7k 55 558424
huggingface/speech-to-speech
Speech To Speech: an effort for an open-sourced and modular GPT4-o
Language:Python3.6k 44 88371
FunAudioLLM/SenseVoice
Multilingual Voice Understanding Model
Language:Python3.5k 38 132316
tensorflow/lingvo
Lingvo
Language:Python2.8k 118 254445
toverainc/willow
Open source, local, and self-hosted Amazon Echo/Google Home competitive Voice Assistant alternative
Language:C2.6k 44 15996
ictnlp/LLaMA-Omni
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Language:Python2.6k 28 46174
jianchang512/stt
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具，输出json、srt字幕、纯文字格式
Language:Python2.5k 11 88277
coqui-ai/STT
🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.
Language:C++2.3k 62 183278
pannous/tensorflow-speech-recognition
🎙Speech recognition using the tensorflow deep learning framework, sequence-to-sequence neural networks
Language:Python2.2k 190 70638
ahmetoner/whisper-asr-webservice
OpenAI Whisper ASR Webservice API
Language:Python2.1k 30 159380
KoljaB/RealtimeSTT
A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
Language:Python2.1k 31 101188
linto-ai/whisper-timestamped
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
Language:Python2.1k 30 155156
jarikomppa/soloud
Free, easy, portable audio engine for games
Language:C1.8k 63 264284
mesolitica/NLP-Models-Tensorflow
Gathers machine learning and Tensorflow deep learning models for NLP problems, 1.13 < Tensorflow < 2.0
Language:Jupyter Notebook1.8k 96 29729
kalliope-project/kalliope
Kalliope is a framework that will help you to create your own personal assistant.
Language:Python1.7k 82 332228
bugbakery/audapolis
an editor for spoken-word audio with automatic transcription
Language:TypeScript1.7k 26 25540
pluja/whishper
Transcribe any audio to text, translate and edit subtitles 100% locally with a web UI. Powered by whisper models!
Language:Svelte1.6k 27 11092
NVIDIA/OpenSeq2Seq
Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
Language:Python1.6k 93 256369
DragonComputer/Dragonfire
the open-source virtual assistant for Ubuntu based Linux distributions
Language:Python1.4k 85 102211
Purfview/whisper-standalone-win
Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.
1.3k 32 18266

speech-to-text

ggerganov/whisper.cpp

mozilla/DeepSpeech

leon-ai/leon

kaldi-asr/kaldi

SYSTRAN/faster-whisper

m-bain/whisperX

jianchang512/pyvideotrans

speechbrain/speechbrain

Uberi/speech_recognition

alphacep/vosk-api

nl8590687/ASRT_SpeechRecognition

TalAter/annyang

snakers4/silero-models

sanchit-gandhi/whisper-jax

MahmoudAshraf97/whisper-diarization

modelscope/FunClip

k2-fsa/sherpa-onnx

huggingface/speech-to-speech

FunAudioLLM/SenseVoice

tensorflow/lingvo

toverainc/willow

ictnlp/LLaMA-Omni

jianchang512/stt

coqui-ai/STT

pannous/tensorflow-speech-recognition

ahmetoner/whisper-asr-webservice

KoljaB/RealtimeSTT

linto-ai/whisper-timestamped

jarikomppa/soloud

mesolitica/NLP-Models-Tensorflow

kalliope-project/kalliope

bugbakery/audapolis

pluja/whishper

NVIDIA/OpenSeq2Seq

DragonComputer/Dragonfire

Purfview/whisper-standalone-win