waywardspooky's Stars
2noise/ChatTTS
A generative speech model for daily dialogue.
m-bain/whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
speechbrain/speechbrain
A PyTorch-based Speech Toolkit
Vaibhavs10/insanely-fast-whisper
jaywalnut310/vits
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
pyannote/pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
open-mmlab/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
huggingface/parler-tts
Inference and training library for high-quality TTS models.
snakers4/silero-vad
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
metavoiceio/metavoice-src
Foundational model for human-like, expressive TTS
MahmoudAshraf97/whisper-diarization
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
buaacyw/MeshAnything
From anything to mesh like human artists. Official impl. of "MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers"
kadirnar/whisper-plus
WhisperPlus: Faster, Smarter, and More Capable 🚀
xenova/whisper-web
ML-powered speech recognition directly in your browser
zou-group/textgrad
TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients.
sc0ty/subsync
Subtitle Speech Synchronizer
abdeladim-s/subsai
🎞️ Subtitles generation tool (Web-UI + CLI + Python package) powered by OpenAI's Whisper and its variants 🎞️
pydn/ComfyUI-to-Python-Extension
A powerful tool that translates ComfyUI workflows into executable Python code.
erew123/alltalk_tts
AllTalk is based on the Coqui TTS engine, similar to the Coqui_tts extension for Text generation webUI, however supports a variety of advanced features, such as a settings page, low VRAM support, DeepSpeed, narrator, model finetuning, custom models, wav file maintenance. It can also be used with 3rd Party software via JSON calls.
transcriptionstream/transcriptionstream
turnkey self-hosted offline transcription and diarization service with llm summary
McCloudS/subgen
Autogenerate subtitles using OpenAI Whisper Model via Jellyfin, Plex, Emby, Tautulli, or Bazarr
MasayaKawamura/MB-iSTFT-VITS
Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform
daswer123/xtts-api-server
A simple FastAPI Server to run XTTSv2
YuanGongND/whisper-at
Code and Pretrained Models for Interspeech 2023 Paper "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong Audio Event Taggers"
tomchang25/whisper-auto-transcribe
Auto transcribe tool based on whisper
Vaibhavs10/optimise-my-whisper
matatonic/openedai-vision
An OpenAI API compatible API for chat with image input and questions about the images. aka Multimodal.
Vali-98/XTTS-RVC-UI
A Gradio UI for XTTSv2 and RVC.
metavoiceio/MetaVoiceLive
deepestcyber/vmse2000-detector