ygyuan's Stars
myshell-ai/OpenVoice
Instant voice cloning by MIT and MyShell.
svc-develop-team/so-vits-svc
SoftVC VITS Singing Voice Conversion
RVC-Project/Retrieval-based-Voice-Conversion-WebUI
Easily train a good VC model with voice data <= 10 mins!
jiaaro/pydub
Manipulate audio with a simple and easy high level interface
FunAudioLLM/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
SWivid/F5-TTS
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
kyutai-labs/moshi
pyannote/pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
AILab-CVC/YOLO-World
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
FunAudioLLM/SenseVoice
Multilingual Voice Understanding Model
huggingface/distil-whisper
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
OpenNMT/CTranslate2
Fast inference engine for Transformer models
ymcui/Chinese-ELECTRA
Pre-trained Chinese ELECTRA(中文ELECTRA预训练模型)
innnky/emotional-vits
无需情感标注的情感可控语音合成模型,基于VITS
jishengpeng/WavTokenizer
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
NVIDIA/BigVGAN
Official PyTorch implementation of BigVGAN (ICLR 2023)
lenML/Speech-AI-Forge
🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.
Plachtaa/seed-vc
zero-shot voice conversion & singing voice conversion, with real-time support
lifeiteng/OmniSenseVoice
Omni SenseVoice: High-Speed Speech Recognition with words timestamps 🗣️🎯
FireRedTeam/FireRedTTS
An Open-Sourced LLM-empowered Foundation TTS System
facebookresearch/speech-resynthesis
An official reimplementation of the method described in the INTERSPEECH 2021 paper - Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.
liutaocode/TTS-arxiv-daily
Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)
thuhcsi/NeuCoSVC
hayeong0/DDDM-VC
Official Pytorch Implementation for "DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled Representation and Prior Mixup for Verified Robust Voice Conversion" (AAAI 2024)
xingchensong/S3Tokenizer
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
huangxu1991/GPT-SoVITS-VC
VC Without Retrain!
hhguo/SoCodec
Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications
LetterLiGo/SafeEar
SafeEar: Content Privacy-Preserving Audio Deepfake Detection (Accepted by CCS 2024)
smileslab/Comparative-Analysis-Voice-Spoofing
A comapartive analysis of voice spoofing detection systems, based on a paper available at https://arxiv.org/abs/2210.00417.
ppmzhang2/seed-vc
zero-shot voice conversion with in context learning