cythc

cythc's Stars

2noise/ChatTTS
A generative speech model for daily dialogue.
Language:Python32k 185 5513.5k
facebookresearch/seamless_communication
Foundational Models for State-of-the-Art Speech and Text Translation
Language:Jupyter Notebook10.9k 140 3571.1k
FunAudioLLM/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Language:Python6k 57 483643
gpt-omni/mini-omni
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
Language:Python3k 96 107267
YangLing0818/Diffusion-Models-Papers-Survey-Taxonomy
Diffusion model papers, survey, and taxonomy
3k 53 8247
0nutation/SpeechGPT
SpeechGPT Series: Speech Large Language Models
Language:Python1.3k 46 4885
descriptinc/descript-audio-codec
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
Language:Python1.2k 26 76111
NVIDIA/BigVGAN
Official PyTorch implementation of BigVGAN (ICLR 2023)
Language:Python875 70 0100
pettarin/forced-alignment-tools
A collection of links and notes on forced alignment tools
Language:Python871 38 686
csteinmetz1/auraloss
Collection of audio-focused loss functions in PyTorch
Language:Python735 18 3667
litagin02/Style-Bert-VITS2
Style-Bert-VITS2: Bert-VITS2 with more controllable voice styles.
Language:Python735 14 11291
shivammehta25/Matcha-TTS
[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
Language:Jupyter Notebook714 15 6687
yangdongchao/AcademiCodec
AcademiCodec: An Open Source Audio Codec Model for Academic Research
Language:Python589 31 4081
Plachtaa/seed-vc
State-of-the-Art zero-shot voice conversion & singing voice conversion with in context learning
Language:Python503 18 3154
ZiyaoLi/fast-kan
FastKAN: Very Fast Implementation of Kolmogorov-Arnold Networks (KAN)
Language:Jupyter Notebook361 2 1447
KdaiP/StableTTS
Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3
Language:Python359 23 2142
dongzhuoyao/awesome-flow-matching
A summary of related works about flow matching, stochastic interpolants
312 11 210
X-LANCE/VoiceFlow-TTS
[ICASSP 2024] This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"
Language:Python306 15 1721
belambert/asr-evaluation
Python module for evaluating ASR hypotheses (e.g. word error rate, word recognition rate).
Language:Python269 15 978
OpenT2S/LlamaVoice
LlamaVoice is a llama-based large voice generation model, providing inference and training ability.
Language:Python217 22 311
kslz/sound_dataset_tools2
一个快速制作语音数据集的可视化工具
Language:Python191 3 917
descriptinc/cargan
Official repository for the paper "Chunked Autoregressive GAN for Conditional Waveform Synthesis"
Language:Python188 22 1429
mct10/RepCodec
Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization
Language:Python154 14 610
thuhcsi/SECap
Language:Python137 3 912
0nutation/USLM
Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)
Language:Python136 8 411
xingchensong/S3Tokenizer
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
Language:Python119 7 411
vtuber-plan/NSF-HiFiGAN
Vocoder NSF-HiFiGAN (Moved into deepaudio)
Language:Python48 6 02
ttslr/python-MCD
Language:Python46 3 37
jingzhunxue/flow_mirror
flow mirror models from JZX AI Labs
Language:Python40 2 12
HydraFormer/hydraformer
Language:C++9 2 01