automatic-speech-recognition

There are 360 repositories under automatic-speech-recognition topic.

wenet-e2e/wenet
Production First and Production Ready End-to-End Speech Recognition Toolkit
Language:Python4.9k 90 1.1k1.2k
zzw922cn/awesome-speech-recognition-speech-synthesis-papers
Automatic Speech Recognition (ASR), Speaker Verification, Speech Synthesis, Text-to-Speech (TTS), Language Modelling, Singing Voice Synthesis (SVS), Voice Conversion (VC)
3.1k 183 7514
ahmetoner/whisper-asr-webservice
OpenAI Whisper ASR Webservice API
Language:Python3k 30 220535
zzw922cn/Automatic_Speech_Recognition
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
Language:Python2.8k 143 90533
coqui-ai/STT
🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.
Language:C++2.5k 59 183300
FireRedTeam/FireRedASR
Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics recognition capability.
Language:Python1.6k 17 95138
TEN-framework/ten-vad
Voice Activity Detector (VAD) : low-latency, high-performance and lightweight
Language:C1.6k 20 38130
kakaobrain/pororo
PORORO: Platform Of neuRal mOdels for natuRal language prOcessing
Language:Python1.3k 38 0223
TensorSpeech/TensorFlowASR
:zap: TensorFlowASR: Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2. Supported languages that can use characters or subwords
Language:Python995 24 212239
FluidInference/FluidAudio
Frontier CoreML audio models in your apps — text-to-speech, speech-to-text, voice activity detection, and speaker diarization. In Swift, powered by SOTA open source.
Language:Swift853 40 35108
jitsi/jiwer
Evaluate your speech-to-text system with similarity measures such as word error rate (WER)
Language:Python816 12 57106
snakers4/open_stt
Open STT
Language:Python810 54 3984
EmulationAI/awesome-large-audio-models
Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.
701 25 344
shirayu/whispering
Streaming transcriber with whisper
Language:Python691 18 4151
Picovoice/cheetah
On-device streaming speech-to-text engine powered by deep learning
Language:Python640 27 8674
hirofumi0810/neural_sp
End-to-end ASR/LM implementation with PyTorch
Language:Python594 33 89138
vilassn/whisper_android
Offline Speech Recognition with OpenAI Whisper and TensorFlow Lite for Android
Language:C++566 6 3796
YoavRamon/awesome-kaldi
This is a list of features, scripts, blogs and resources for better using Kaldi ( http://kaldi-asr.org/ )
539 25 084
Z-yq/TensorflowASR
一个执着于让CPU\端侧-Model逼近GPU-Model性能的项目，CPU上的实时率(RTF)小于0.1
Language:Python475 20 50114
jonatasgrosman/huggingsound
HuggingSound: A toolkit for speech-related tasks based on Hugging Face's tools
Language:Python466 16 4946
Picovoice/leopard
On-device speech-to-text engine powered by deep learning
Language:Python464 17 5530
double22a/speech_dataset
The dataset of Speech Recognition
433 8 378
ArthurFDLR/whisper-youtube
🔉 Youtube Videos Transcription with OpenAI's Whisper
Language:Jupyter Notebook407 6 12117
leduckhai/MultiMed
[LREC-COLING 2024 (Oral), Interspeech 2024 (Oral), NAACL 2025, ACL 2025] A Series of Multilingual Multitask Medical Speech Processing
Language:Python362 6 336
hirofumi0810/tensorflow_end2end_speech_recognition
End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)
Language:Python315 33 18119
m3hrdadfi/soxan
Wav2Vec for speech recognition, classification, and audio classification
Language:Jupyter Notebook267 7 2038
NavodPeiris/speechlib
speechlib is a library that can do speaker diarization, transcription and speaker recognition on an audio file to create transcripts with actual speaker names.
Language:Python240 5 1824
smeetrs/deep_avsr
A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.
Language:Python237 6 4442
rolczynski/Automatic-Speech-Recognition
🎧 Automatic Speech Recognition: DeepSpeech & Seq2Seq (TensorFlow)
Language:Python225 11 3063
bricewalker/Hey-Jetson
Deep Learning based Automatic Speech Recognition with attention for the Nvidia Jetson.
Language:Jupyter Notebook199 9 1240
anton-jeran/FAST-RIR
This is the official implementation of our neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment.
Language:Python173 5 432
sovaai/sova-asr
SOVA ASR (Automatic Speech Recognition)
Language:Python173 12 2422
biodatlab/thonburian-whisper
Thonburian Whisper: Open models for fine-tuned Whisper in Thai. Try our demo on Huggingface space:
Language:Jupyter Notebook167 4 518
noco-ai/spellbook-docker
AI stack for interacting with LLMs, Stable Diffusion, Whisper, xTTS and many other AI models
Language:Shell163 4 314
CoEDL/elpis
🙊 software for creating speech recognition models.
Language:Python159 15 17632
dangvansam/viet-asr
VietASR - Vietnamese Automatic Speech Recognition
Language:Python156 4 858

automatic-speech-recognition

wenet-e2e/wenet

zzw922cn/awesome-speech-recognition-speech-synthesis-papers

ahmetoner/whisper-asr-webservice

zzw922cn/Automatic_Speech_Recognition

coqui-ai/STT

FireRedTeam/FireRedASR

TEN-framework/ten-vad

kakaobrain/pororo

TensorSpeech/TensorFlowASR

FluidInference/FluidAudio

jitsi/jiwer

snakers4/open_stt

EmulationAI/awesome-large-audio-models

shirayu/whispering

Picovoice/cheetah

hirofumi0810/neural_sp

vilassn/whisper_android

YoavRamon/awesome-kaldi

Z-yq/TensorflowASR

jonatasgrosman/huggingsound

Picovoice/leopard

double22a/speech_dataset

ArthurFDLR/whisper-youtube

leduckhai/MultiMed

hirofumi0810/tensorflow_end2end_speech_recognition

m3hrdadfi/soxan

NavodPeiris/speechlib

smeetrs/deep_avsr

rolczynski/Automatic-Speech-Recognition

bricewalker/Hey-Jetson

anton-jeran/FAST-RIR

sovaai/sova-asr

biodatlab/thonburian-whisper

noco-ai/spellbook-docker

CoEDL/elpis

dangvansam/viet-asr