c9412600

c9412600's Stars

suno-ai/bark
🔊 Text-Prompted Generative Audio Model
Language:Jupyter Notebook33.7k 311 4224k
RVC-Boss/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Language:Python28.5k 187 9243.3k
facebookresearch/audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
Language:Python20.2k 194 3642k
facebookresearch/seamless_communication
Foundational Models for State-of-the-Art Speech and Text Translation
Language:Jupyter Notebook10.5k 139 3331k
rhasspy/piper
A fast, local neural text to speech system
Language:C++5k 68 402346
open-mmlab/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
Language:Python4.2k 56 123354
alibaba-damo-academy/FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Language:Python4k 48 841456
wenet-e2e/wenet
Production First and Production Ready End-to-End Speech Recognition Toolkit
Language:Python3.9k 90 9871k
fishaudio/fish-speech
Brand new TTS solution
Language:Python3.2k 44 190250
facebookresearch/audio2photoreal
Code and dataset for photorealistic Codec Avatars driven from audio
Language:Python2.6k 29 51243
lucidrains/naturalspeech2-pytorch
Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch
Language:Python1.2k 56 3098
sh-lee-prml/HierSpeechpp
The official implementation of HierSpeech++
Language:Python1.1k 58 45134
NATSpeech/NATSpeech
A Non-Autoregressive Text-to-Speech (NAR-TTS) framework, including official PyTorch implementation of PortaSpeech (NeurIPS 2021) and DiffSpeech (AAAI 2022)
Language:Python959 20 2699
bytedance/SALMONN
SALMONN: Speech Audio Language Music Open Neural Network
Language:Python878 26 3362
lmnt-com/diffwave
DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.
Language:Python733 21 47110
k2-fsa/sherpa
Speech-to-text server framework with next-gen Kaldi
Language:C++473 33 18597
p0p4k/vits2_pytorch
unofficial vits2-TTS implementation in pytorch
Language:Python461 25 5380
shivammehta25/Matcha-TTS
[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
Language:Jupyter Notebook439 13 4556
Rongjiehuang/FastDiff
PyTorch Implementation of FastDiff (IJCAI'22)
Language:Python396 25 2763
wenet-e2e/wetts
Production First and Production Ready End-to-End Text-to-Speech Toolkit
Language:Python356 14 5256
alibaba-damo-academy/FunCodec
FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.
Language:Python292 16 4223
janvainer/speedyspeech
Language:Python247 10 4235
kslz/sound_dataset_tools2
一个快速制作语音数据集的可视化工具
Language:Python186 3 915
hhguo/MSMC-TTS
Official Implement of Multi-Stage Multi-Codebook (MSMC) TTS
Language:Python157 15 914
X-LANCE/StoryTTS
[ICASSP 2024] StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations
Language:HTML129 18 14
MontrealCorpusTools/mfa-models
Collection of pretrained models for the Montreal Forced Aligner
Language:Python101 7 2019
thuhcsi/FlatTN
Chinese Text Normalization and Dataset
Language:Python76 2 415
p0p4k/vits3_pytorch
Language:Python26 7 12
faliwang/Universal-Adaptor
Language:Python7 1 10
zjumml/Revisit-NAR-TTS
Language:Python3 0 00