asr

There are 1228 repositories under asr topic.

  • m-bain/whisperX

    WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

    Language:Python14.7k1457921.6k
  • NVIDIA/NeMo

    A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

    Language:Python13.5k2162.5k2.8k
  • PaddlePaddle/PaddleSpeech

    Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

    Language:Python11.7k1872k1.9k
  • speechbrain/speechbrain

    A PyTorch-based Speech Toolkit

    Language:Python9.6k1341.1k1.5k
  • alphacep/vosk-api

    Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

    Language:Jupyter Notebook9.1k1201.6k1.2k
  • wukong-robot

    wzpan/wukong-robot

    🤖 wukong-robot 是一个简单、灵活、优雅的中文语音对话机器人/智能音箱项目,支持ChatGPT多轮对话能力,还可能是首个支持脑机交互的开源智能音箱项目。

    Language:Python6.7k1753051.4k
  • k2-fsa/sherpa-onnx

    Speech-to-text, text-to-speech, speaker diarization, speech enhancement, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, support 11 programming languages

    Language:C++5.4k73801611
  • TEN-framework/TEN-Agent

    TEN Agent is a conversational voice AI agent powered by TEN, integrating Deepseek, Gemini, OpenAI, RTC, and hardware like ESP32. It enables realtime AI capabilities like seeing, hearing, and speaking, and is fully compatible with platforms like Dify and Coze.

    Language:Python5.4k54217609
  • snakers4/silero-models

    Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple

    Language:Jupyter Notebook5.2k85131331
  • FunAudioLLM/SenseVoice

    Multilingual Voice Understanding Model

    Language:Python5.1k51186466
  • xiangyuecn/Recorder

    html5 js 录音 mp3 wav ogg webm amr g711a g711u 格式,支持pc和Android、iOS部分浏览器、Hybrid App(提供Android iOS App源码)、微信,提供ASR语音识别转文字 H5版语音通话聊天示例 DTMF编码解码

    Language:JavaScript5.1k802681.1k
  • NexaAI/nexa-sdk

    Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.

    Language:Python4.5k423106623
  • wenet-e2e/wenet

    Production First and Production Ready End-to-End Speech Recognition Toolkit

    Language:Python4.4k911.1k1.1k
  • MahmoudAshraf97/whisper-diarization

    Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

    Language:Jupyter Notebook4.3k46242396
  • jdepoix/youtube-transcript-api

    This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headless browser, like other selenium based solutions do!

    Language:Python3.7k37289421
  • PeterH0323/Streamer-Sales

    Streamer-Sales 销冠 —— 卖货主播 LLM 大模型🛒🎁,一个能够根据给定的商品特点从激发用户购买意愿角度出发进行商品解说的卖货主播大模型。🚀⭐内含详细的数据生成流程❗ 📦另外还集成了 LMDeploy 加速推理🚀、RAG检索增强生成 📚、TTS文字转语音🔊、数字人生成 🦸、 Agent 使用网络查询实时信息🌐、ASR 语音转文字🎙️、Vue 生态搭建前端🍍、FastAPI 搭建后端🗝️、Docker-compose 打包部署🐋

    Language:Python3.1k4531482
  • tensorflow/lingvo

    Lingvo

    Language:Python2.8k117254448
  • ahmetoner/whisper-asr-webservice

    OpenAI Whisper ASR Webservice API

    Language:Python2.5k30191444
  • coqui-ai/STT

    🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.

    Language:C++2.4k61183282
  • pytorch-kaldi

    mravanelli/pytorch-kaldi

    pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.

    Language:Python2.4k92215445
  • linto-ai/whisper-timestamped

    Multilingual Automatic Speech Recognition with word-level timestamps and confidence

    Language:Python2.3k34161179
  • CheshireCC/faster-whisper-GUI

    faster_whisper GUI with PySide6

    Language:Python2.2k20267130
  • Purfview/whisper-standalone-win

    Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.

  • Delta-ML/delta

    DELTA is a deep learning based natural language and speech processing platform. LF AI & DATA Projects: https://lfaidata.foundation/projects/delta/

    Language:Python1.6k6475288
  • harry0703/AudioNotes

    快速提取音视频内容,整理成一份结构化的markdown笔记

    Language:Python1.6k1237226
  • k2-fsa/sherpa-ncnn

    Real-time speech recognition and voice activity detection (VAD) using next-gen Kaldi with ncnn without Internet connection. Support iOS, Android, Linux, macOS, Windows, Raspberry Pi, VisionFive2, LicheePi4A etc.

    Language:C++1.2k34158171
  • mravanelli/SincNet

    SincNet is a neural architecture for efficiently processing raw audio samples.

    Language:Python1.2k33106264
  • Speech-AI-Forge

    lenML/Speech-AI-Forge

    🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.

    Language:Python1.1k18166149
  • R3gm/SoniTranslate

    Synchronized Translation for Videos. Video dubbing

    Language:Python1.1k19141218
  • ictnlp/StreamSpeech

    StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.

    Language:Python1k131680
  • sooftware/conformer

    [Unofficial] PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)

    Language:Python1k737184
  • pykaldi/pykaldi

    A Python wrapper for Kaldi

    Language:Python1k38278246
  • alphacep/vosk-server

    WebSocket, gRPC and WebRTC speech recognition server based on Vosk and Kaldi libraries

    Language:Python1k54209273
  • wwbin2017/bailing

    百聆 是一个类似GPT-4o的语音对话机器人,通过ASR+LLM+TTS实现,集成DeepSeek R1等优秀大模型,时延低至800ms,Mac等低配置也可运行,支持打断

    Language:Python9991435175
  • yeyupiaoling/Whisper-Finetune

    Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training without speech data. Accelerate inference and support Web deployment, Windows desktop deployment, and Android deployment

    Language:C99410103160
  • athena-team/athena

    an open-source implementation of sequence-to-sequence based speech processing engine

    Language:C++94336137189