speech
There are 1826 repositories under speech topic.
coqui-ai/TTS
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
babysor/MockingBird
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
svc-develop-team/so-vits-svc
SoftVC VITS Singing Voice Conversion
huggingface/datasets
🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools
m-bain/whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
IDEA-Research/Grounded-Segment-Anything
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
kaldi-asr/kaldi
kaldi-asr/kaldi is the official location of the Kaldi project.
AIGC-Audio/AudioGPT
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
mozilla/TTS
:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
modelscope/modelscope
ModelScope: bring the notion of Model-as-a-Service to life.
netease-youdao/EmotiVoice
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
PaddlePaddle/models
Officially maintained, supported by PaddlePaddle, including CV, NLP, Speech, Rec, TS, big models and so on.
snakers4/silero-vad
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
TalAter/annyang
💬 Speech recognition for your site
snakers4/silero-models
Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
MahmoudAshraf97/whisper-diarization
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
fixie-ai/ultravox
A fast multimodal LLM for real-time voice
huggingface/speech-to-speech
Speech To Speech: an effort for an open-sourced and modular GPT4-o
metavoiceio/metavoice-src
Foundational model for human-like, expressive TTS
jianchang512/stt
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
modelscope/ClearerVoice-Studio
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
Rikorose/DeepFilterNet
Noise supression using deep filtering
shu223/iOS-10-Sampler
Code examples for new APIs of iOS 10.
avinashkranjan/Amazing-Python-Scripts
🚀 Curated collection of Amazing Python scripts from Basics to Advance with automation task scripts.
ahmetoner/whisper-asr-webservice
OpenAI Whisper ASR Webservice API
tensorflow/lingvo
Lingvo
Camb-ai/MARS5-TTS
MARS5 speech model (TTS) from CAMB.AI
hahahumble/speechgpt
💬 SpeechGPT is a web application that enables you to converse with ChatGPT.
readbeyond/aeneas
aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
pytorch/audio
Data manipulation and transformation for audio signal processing, powered by PyTorch
OvidijusParsiunas/deep-chat
Fully customizable AI chatbot component for your website
linto-ai/whisper-timestamped
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
IAHispano/Applio
A simple, high-quality voice conversion tool focused on ease of use and performance.
pndurette/gTTS
Python library and CLI tool to interface with Google Translate's text-to-speech API
mravanelli/pytorch-kaldi
pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.
r9y9/wavenet_vocoder
WaveNet vocoder