c9412600's Stars
suno-ai/bark
🔊 Text-Prompted Generative Audio Model
RVC-Boss/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
facebookresearch/audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
facebookresearch/seamless_communication
Foundational Models for State-of-the-Art Speech and Text Translation
rhasspy/piper
A fast, local neural text to speech system
open-mmlab/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
alibaba-damo-academy/FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
wenet-e2e/wenet
Production First and Production Ready End-to-End Speech Recognition Toolkit
fishaudio/fish-speech
Brand new TTS solution
facebookresearch/audio2photoreal
Code and dataset for photorealistic Codec Avatars driven from audio
lucidrains/naturalspeech2-pytorch
Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch
sh-lee-prml/HierSpeechpp
The official implementation of HierSpeech++
NATSpeech/NATSpeech
A Non-Autoregressive Text-to-Speech (NAR-TTS) framework, including official PyTorch implementation of PortaSpeech (NeurIPS 2021) and DiffSpeech (AAAI 2022)
bytedance/SALMONN
SALMONN: Speech Audio Language Music Open Neural Network
lmnt-com/diffwave
DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.
k2-fsa/sherpa
Speech-to-text server framework with next-gen Kaldi
p0p4k/vits2_pytorch
unofficial vits2-TTS implementation in pytorch
shivammehta25/Matcha-TTS
[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
Rongjiehuang/FastDiff
PyTorch Implementation of FastDiff (IJCAI'22)
wenet-e2e/wetts
Production First and Production Ready End-to-End Text-to-Speech Toolkit
alibaba-damo-academy/FunCodec
FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.
janvainer/speedyspeech
kslz/sound_dataset_tools2
一个快速制作语音数据集的可视化工具
hhguo/MSMC-TTS
Official Implement of Multi-Stage Multi-Codebook (MSMC) TTS
X-LANCE/StoryTTS
[ICASSP 2024] StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations
MontrealCorpusTools/mfa-models
Collection of pretrained models for the Montreal Forced Aligner
thuhcsi/FlatTN
Chinese Text Normalization and Dataset
p0p4k/vits3_pytorch
faliwang/Universal-Adaptor
zjumml/Revisit-NAR-TTS