cythc's Stars
2noise/ChatTTS
A generative speech model for daily dialogue.
facebookresearch/seamless_communication
Foundational Models for State-of-the-Art Speech and Text Translation
FunAudioLLM/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
gpt-omni/mini-omni
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
YangLing0818/Diffusion-Models-Papers-Survey-Taxonomy
Diffusion model papers, survey, and taxonomy
0nutation/SpeechGPT
SpeechGPT Series: Speech Large Language Models
descriptinc/descript-audio-codec
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
NVIDIA/BigVGAN
Official PyTorch implementation of BigVGAN (ICLR 2023)
pettarin/forced-alignment-tools
A collection of links and notes on forced alignment tools
csteinmetz1/auraloss
Collection of audio-focused loss functions in PyTorch
litagin02/Style-Bert-VITS2
Style-Bert-VITS2: Bert-VITS2 with more controllable voice styles.
shivammehta25/Matcha-TTS
[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
yangdongchao/AcademiCodec
AcademiCodec: An Open Source Audio Codec Model for Academic Research
Plachtaa/seed-vc
State-of-the-Art zero-shot voice conversion & singing voice conversion with in context learning
ZiyaoLi/fast-kan
FastKAN: Very Fast Implementation of Kolmogorov-Arnold Networks (KAN)
KdaiP/StableTTS
Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3
dongzhuoyao/awesome-flow-matching
A summary of related works about flow matching, stochastic interpolants
X-LANCE/VoiceFlow-TTS
[ICASSP 2024] This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"
belambert/asr-evaluation
Python module for evaluating ASR hypotheses (e.g. word error rate, word recognition rate).
OpenT2S/LlamaVoice
LlamaVoice is a llama-based large voice generation model, providing inference and training ability.
kslz/sound_dataset_tools2
一个快速制作语音数据集的可视化工具
descriptinc/cargan
Official repository for the paper "Chunked Autoregressive GAN for Conditional Waveform Synthesis"
mct10/RepCodec
Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization
thuhcsi/SECap
0nutation/USLM
Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)
xingchensong/S3Tokenizer
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
vtuber-plan/NSF-HiFiGAN
Vocoder NSF-HiFiGAN (Moved into deepaudio)
ttslr/python-MCD
jingzhunxue/flow_mirror
flow mirror models from JZX AI Labs
HydraFormer/hydraformer