asr

There are 1314 repositories under asr topic.

m-bain/whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Language:Python17.8k 157 8691.9k
NVIDIA-NeMo/NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Language:Python15.7k 229 2.8k3.1k
alphacep/vosk-api
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Language:Jupyter Notebook13.2k 133 1.7k1.6k
PaddlePaddle/PaddleSpeech
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
Language:Python12.2k 188 2k1.9k
speechbrain/speechbrain
A PyTorch-based Speech Toolkit
Language:Python10.4k 134 1.2k1.6k
k2-fsa/sherpa-onnx
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, support 12 programming languages
Language:C++7.4k 87 1k867
wzpan/wukong-robot
🤖 wukong-robot 是一个简单、灵活、优雅的中文语音对话机器人/智能音箱项目，支持ChatGPT多轮对话能力，还可能是首个支持脑机交互的开源智能音箱项目。
Language:Python7k 176 3071.4k
FunAudioLLM/SenseVoice
Multilingual Voice Understanding Model
Language:Python6.6k 57 234601
jdepoix/youtube-transcript-api
This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headless browser, like other selenium based solutions do!
Language:Python6.2k 47 377644
snakers4/silero-models
Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
Language:Jupyter Notebook5.5k 87 135346
xiangyuecn/Recorder
html5 js 录音 mp3 wav ogg webm amr g711a g711u 格式，支持pc和Android、iOS部分浏览器、Hybrid App（提供Android iOS App源码）、微信，提供ASR语音识别转文字 H5版语音通话聊天示例 DTMF编码解码
Language:JavaScript5.4k 84 2821.1k
MahmoudAshraf97/whisper-diarization
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Language:Jupyter Notebook5k 47 254462
wenet-e2e/wenet
Production First and Production Ready End-to-End Speech Recognition Toolkit
Language:Python4.8k 93 1.1k1.2k
PeterH0323/Streamer-Sales
Streamer-Sales 销冠 —— 卖货主播 LLM 大模型🛒🎁，一个能够根据给定的商品特点从激发用户购买意愿角度出发进行商品解说的卖货主播大模型。🚀⭐内含详细的数据生成流程❗ 📦另外还集成了 LMDeploy 加速推理🚀、RAG检索增强生成 📚、TTS文字转语音🔊、数字人生成 🦸、 Agent 使用网络查询实时信息🌐、ASR 语音转文字🎙️、Vue 生态搭建前端🍍、FastAPI 搭建后端🗝️、Docker-compose 打包部署🐋
Language:Python3.5k 45 31490
ahmetoner/whisper-asr-webservice
OpenAI Whisper ASR Webservice API
Language:Python2.9k 32 218514
tensorflow/lingvo
Lingvo
Language:Python2.9k 116 255452
CheshireCC/faster-whisper-GUI
faster_whisper GUI with PySide6
Language:Python2.7k 20 291153
linto-ai/whisper-timestamped
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
Language:Python2.6k 34 161197
coqui-ai/STT
🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.
Language:C++2.5k 61 183298
Purfview/whisper-standalone-win
Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.
2.5k 51 272125
mravanelli/pytorch-kaldi
pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.
Language:Python2.4k 92 214446
umlx5h/LLPlayer
The media player for language learning, with dual subtitles, AI-generated subtitles, real-time translation, and more!
Language:C#2.1k 12 50108
harry0703/AudioNotes
快速提取音视频内容，整理成一份结构化的markdown笔记
Language:Python1.9k 12 37264
Delta-ML/delta
DELTA is a deep learning based natural language and speech processing platform. LF AI & DATA Projects: https://lfaidata.foundation/projects/delta/
Language:Python1.6k 64 75288
k2-fsa/sherpa-ncnn
Real-time speech recognition and voice activity detection (VAD) using next-gen Kaldi with ncnn without Internet connection. Support iOS, Android, Linux, macOS, Windows, Raspberry Pi, VisionFive2, LicheePi4A etc.
Language:C++1.5k 34 173196
wwbin2017/bailing
百聆是一个类似GPT-4o的语音对话机器人，通过ASR+LLM+TTS实现，集成DeepSeek R1等优秀大模型，时延低至800ms，Mac等低配置也可运行，支持打断
Language:Python1.4k 15 53242
FireRedTeam/FireRedASR
Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics recognition capability.
Language:Python1.3k 16 56102
lenML/Speech-AI-Forge
🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.
Language:Python1.3k 18 179180
R3gm/SoniTranslate
Synchronized Translation for Videos. Video dubbing
Language:Python1.2k 25 151274
mravanelli/SincNet
SincNet is a neural architecture for efficiently processing raw audio samples.
Language:Python1.2k 33 106269
alphacep/vosk-server
WebSocket, gRPC and WebRTC speech recognition server based on Vosk and Kaldi libraries
Language:Python1.2k 51 212309
ictnlp/StreamSpeech
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
Language:Python1.1k 14 1786
mkiol/dsnote
Speech Note Linux app. Note taking, reading and translating with offline Speech to Text, Text to Speech and Machine translation.
Language:C++1.1k 22 30645
yeyupiaoling/Whisper-Finetune
Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training without speech data. Accelerate inference and support Web deployment, Windows desktop deployment, and Android deployment
Language:C1.1k 10 116198
Henry-23/VideoChat
实时语音交互数字人，支持端到端语音方案（GLM-4-Voice - THG）和级联方案（ASR-LLM-TTS-THG）。可自定义形象与音色，无须训练，支持音色克隆，首包延迟低至3s。Real-time voice interactive digital human, supporting end-to-end voice solutions (GLM-4-Voice - THG) and cascaded solutions (ASR-LLM-TTS-THG). Customizable appearance and voice, supporting voice cloning, with initial package delay as low as 3s.
Language:Python1.1k 12 62142
sooftware/conformer
[Unofficial] PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)
Language:Python1.1k 7 37187

asr

m-bain/whisperX

NVIDIA-NeMo/NeMo

alphacep/vosk-api

PaddlePaddle/PaddleSpeech

speechbrain/speechbrain

k2-fsa/sherpa-onnx

wzpan/wukong-robot

FunAudioLLM/SenseVoice

jdepoix/youtube-transcript-api

snakers4/silero-models

xiangyuecn/Recorder

MahmoudAshraf97/whisper-diarization

wenet-e2e/wenet

PeterH0323/Streamer-Sales

ahmetoner/whisper-asr-webservice

tensorflow/lingvo

CheshireCC/faster-whisper-GUI

linto-ai/whisper-timestamped

coqui-ai/STT

Purfview/whisper-standalone-win

mravanelli/pytorch-kaldi

umlx5h/LLPlayer

harry0703/AudioNotes

Delta-ML/delta

k2-fsa/sherpa-ncnn

wwbin2017/bailing

FireRedTeam/FireRedASR

lenML/Speech-AI-Forge

R3gm/SoniTranslate

mravanelli/SincNet

alphacep/vosk-server

ictnlp/StreamSpeech

mkiol/dsnote

yeyupiaoling/Whisper-Finetune

Henry-23/VideoChat

sooftware/conformer