blues-green's Stars
RVC-Project/Retrieval-based-Voice-Conversion-WebUI
Easily train a good VC model with voice data <= 10 mins!
black-forest-labs/flux
Official inference repo for FLUX.1 models
NVIDIA/NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
PaddlePaddle/PaddleSpeech
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
SWivid/F5-TTS
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
keithito/tacotron
A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)
Rayhane-mamah/Tacotron-2
DeepMind's Tacotron-2 Tensorflow implementation
ming024/FastSpeech2
An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"
Kyubyong/tacotron
A TensorFlow Implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model
modelscope/3D-Speaker
A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization
feizc/FluxMusic
Text-to-Music Generation with Rectified Flow Transformers
LTH14/mar
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
bytedance/music_source_separation
BytedanceSpeech/seed-tts-eval
sihyun-yu/REPA
[ICLR'25 Oral] Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
xcmyz/FastSpeech
The Implementation of FastSpeech based on pytorch.
lturing/tacotronv2_wavernn_chinese
tacotronV2 + wavernn 实现中文语音合成(Tensorflow + pytorch)
Vaibhavs10/fast-whisper-finetuning
KdaiP/StableTTS
Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3
hugofloresgarcia/vampnet
music generation with masked transformers!
Executedone/Chinese-FastSpeech2
基于标贝数据继续训练,同时对原本的FastSpeech2模型做了改进,引入了韵律表征以及韵律预测模块,使中文发音更生动且富有节奏
haoheliu/AudioLDM-training-finetuning
AudioLDM training, finetuning, evaluation and inference.
nicolaus625/FM4Music
The official GitHub page for the survey paper "Foundation Models for Music: A Survey".
chitosai/eye_protector
May it be the best eye protecting extension on chrome.
haoheliu/SemantiCodec-inference
Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.
lifeiteng/naturalspeech3_facodec
FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3
archinetai/cqt-pytorch
An invertible and differentiable implementation of the Constant-Q Transform (CQT).
ZZWaang/whole-song-gen
biboamy/instrument-streaming
DezhiKong00/Sentencepiece-chinese-bbpe
使用Sentencepiece对中文语料进行分词