speech

There are 1866 repositories under speech topic.

coqui-ai/TTS
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Language:Python43.4k 331 1.2k5.7k
babysor/MockingBird
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
Language:Python36.7k 301 8905.3k
svc-develop-team/so-vits-svc
SoftVC VITS Singing Voice Conversion
Language:Python27.8k 183 1305.1k
huggingface/datasets
🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools
Language:Python20.8k 281 3.2k3k
m-bain/whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Language:Python18.6k 152 8962k
IDEA-Research/Grounded-Segment-Anything
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
Language:Jupyter Notebook17.1k 117 4141.6k
kaldi-asr/kaldi
kaldi-asr/kaldi is the official location of the Kaldi project.
Language:Shell15.2k 682 1.7k5.4k
AIGC-Audio/AudioGPT
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
Language:Python10.2k 131 56860
mozilla/TTS
:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
Language:Jupyter Notebook10k 184 5691.3k
modelscope/modelscope
ModelScope: bring the notion of Model-as-a-Service to life.
Language:Python8.4k 78 818878
netease-youdao/EmotiVoice
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
Language:Python8.4k 71 165734
snakers4/silero-vad
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
Language:Python7.3k 58 309662
PaddlePaddle/models
Officially maintained, supported by PaddlePaddle, including CV, NLP, Speech, Rec, TS, big models and so on.
Language:Python6.9k 261 2k2.9k
TalAter/annyang
💬 Speech recognition for your site
Language:JavaScript6.7k 233 3441k
snakers4/silero-models
Silero Models: pre-trained text-to-speech models made embarrassingly simple
Language:Jupyter Notebook5.5k 87 136348
MahmoudAshraf97/whisper-diarization
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Language:Jupyter Notebook5.1k 46 256476
fixie-ai/ultravox
A fast multimodal LLM for real-time voice
Language:Python4.3k 54 94343
huggingface/speech-to-speech
Speech To Speech: an effort for an open-sourced and modular GPT4-o
Language:Python4.2k 49 103484
metavoiceio/metavoice-src
Foundational model for human-like, expressive TTS
Language:Python4.2k 80 130693
jianchang512/stt
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具，输出json、srt字幕、纯文字格式
Language:Python4k 15 128430
modelscope/ClearerVoice-Studio
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
Language:Python3.6k 38 125291
Rikorose/DeepFilterNet
Noise supression using deep filtering
Language:Python3.5k 34 322343
shu223/iOS-10-Sampler
Code examples for new APIs of iOS 10.
Language:Swift3.3k 105 11335
avinashkranjan/Amazing-Python-Scripts
🚀 Curated collection of Amazing Python scripts from Basics to Advance with automation task scripts.
Language:Jupyter Notebook3.3k 40 1.2k1.2k
ahmetoner/whisper-asr-webservice
OpenAI Whisper ASR Webservice API
Language:Python3k 32 218533
tensorflow/lingvo
Lingvo
Language:Python2.9k 111 255452
Camb-ai/MARS5-TTS
MARS5 speech model (TTS) from CAMB.AI
Language:Jupyter Notebook2.8k 37 53247
hahahumble/speechgpt
💬 SpeechGPT is a web application that enables you to converse with ChatGPT.
Language:TypeScript2.8k 21 47393
pytorch/audio
Data manipulation and transformation for audio signal processing, powered by PyTorch
Language:Python2.8k 66 984739
readbeyond/aeneas
aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
Language:Python2.8k 69 215266
IAHispano/Applio
A simple, high-quality voice conversion tool focused on ease of use and performance.
Language:Python2.7k 32 616451
linto-ai/whisper-timestamped
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
Language:Python2.7k 36 166200
pndurette/gTTS
Python library and CLI tool to interface with Google Translate's text-to-speech API
Language:Python2.5k 63 217384
mravanelli/pytorch-kaldi
pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.
Language:Python2.4k 89 214446
r9y9/wavenet_vocoder
WaveNet vocoder
Language:Python2.4k 95 193496
jarikomppa/soloud
Free, easy, portable audio engine for games
Language:C2k 57 276320

speech

coqui-ai/TTS

babysor/MockingBird

svc-develop-team/so-vits-svc

huggingface/datasets

m-bain/whisperX

IDEA-Research/Grounded-Segment-Anything

kaldi-asr/kaldi

AIGC-Audio/AudioGPT

mozilla/TTS

modelscope/modelscope

netease-youdao/EmotiVoice

snakers4/silero-vad

PaddlePaddle/models

TalAter/annyang

snakers4/silero-models

MahmoudAshraf97/whisper-diarization

fixie-ai/ultravox

huggingface/speech-to-speech

metavoiceio/metavoice-src

jianchang512/stt

modelscope/ClearerVoice-Studio

Rikorose/DeepFilterNet

shu223/iOS-10-Sampler

avinashkranjan/Amazing-Python-Scripts

ahmetoner/whisper-asr-webservice

tensorflow/lingvo

Camb-ai/MARS5-TTS

hahahumble/speechgpt

pytorch/audio

readbeyond/aeneas

IAHispano/Applio

linto-ai/whisper-timestamped

pndurette/gTTS

mravanelli/pytorch-kaldi

r9y9/wavenet_vocoder

jarikomppa/soloud