CaoYuhang's Stars
sindresorhus/awesome
😎 Awesome lists about all kinds of interesting topics
krahets/hello-algo
《Hello 算法》:动画图解、一键运行的数据结构与算法教程。支持 Python, Java, C++, C, C#, JS, Go, Swift, Rust, Ruby, Kotlin, TS, Dart 代码。简体版和繁体版同步更新,English version ongoing
RVC-Boss/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
hpcaitech/Open-Sora
Open-Sora: Democratizing Efficient Video Production for All
EleutherAI/gpt-neo
An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.
kyutai-labs/moshi
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
myshell-ai/MeloTTS
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
metavoiceio/metavoice-src
Foundational model for human-like, expressive TTS
Stability-AI/stable-audio-tools
Generative models for conditional audio generation
Rikorose/DeepFilterNet
Noise supression using deep filtering
Plachtaa/seed-vc
zero-shot voice conversion & singing voice conversion, with real-time support
NUS-HPC-AI-Lab/VideoSys
VideoSys: An easy and efficient system for video generation
juncongmoo/chatllama
ChatLLaMA 📢 Open source implementation for LLaMA-based ChatGPT runnable in a single GPU. 15x faster training process than ChatGPT
ZihanWang314/RAGEN
RAGEN leverages reinforcement learning to train LLM reasoning agents in interactive, stochastic environments.
NexaAI/Awesome-LLMs-on-device
Awesome LLMs on Device: A Comprehensive Survey
ga642381/speech-trident
Awesome speech/audio LLMs, representation learning, and codec models
lucidrains/e2-tts-pytorch
Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in Pytorch
dongzhuoyao/awesome-flow-matching
A summary of related works about flow matching, stochastic interpolants
AMAAI-Lab/mustango
Mustango: Toward Controllable Text-to-Music Generation
LSimon95/megatts2
Unoffical implementation of Megatts2
AaronZ345/GTSinger
Dataset and code of GTSinger(NeurIPS 2024 Spotlight): A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks
OpenT2S/LlamaVoice
LlamaVoice is a llama-based large voice generation model, providing inference and training ability.
JusperLee/TIGER
TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation
qiuqiao/SOFA
SOFA: Singing-Oriented Forced Aligner
ex3ndr/supervoice-voicebox
VoiceBox neural network implementation
NeuralVox/OpenPhonemizer
An espeak-compatible, permissively-licensed IPA phonemizer (G2P) based on DeepPhonemizer. Usable as a drop-in replacement for espeak's GPL phonemizer.
adelacvg/detail_tts
All generative model in one for better TTS model
reppy4620/convnext_tts
Unofficial implementation of ConvNeXt-TTS powered by lightning
tangYang7/fluency_scorer
It's unofficial implementation for speech fluency assessment model
CaoYuhang/SpeechAlgorithms
Speech Algorithms