sunxh16's Stars
voidful/Codec-SUPERB
Audio Codec Speech processing Universal PERformance Benchmark
ZhangXInFD/SpeechTokenizer
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
2noise/ChatTTS
A generative speech model for daily dialogue.
jishengpeng/Languagecodec
Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models
willisma/SiT
Official PyTorch Implementation of "SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers"
bytedance/Make-An-Audio-2
a text-conditional diffusion probabilistic model capable of generating high fidelity audio.
baofff/U-ViT
A PyTorch implementation of the paper "All are Worth Words: A ViT Backbone for Diffusion Models".
Text-to-Audio/Make-An-Audio
PyTorch Implementation of Make-An-Audio (ICML'23) with a Text-to-Audio Generative Model
Tele-AI/TeleSpeech-ASR
ga642381/speech-trident
Awesome speech/audio LLMs, representation learning, and codec models
liutaocode/TTS-arxiv-daily
Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)
soimort/you-get
:arrow_double_down: Dumb downloader that scrapes the web
metame-ai/awesome-audio-plaza
Daily tracking of awesome audio papers, including music generation, zero-shot tts, asr, audio generation
RetroCirce/MusicLDM
The latent diffusion model for text-to-music generation.
Stability-AI/stable-audio-tools
Generative models for conditional audio generation
facebookresearch/audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
horseee/Awesome-Efficient-LLM
A curated list for Efficient Large Language Models
dvlab-research/MGM
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
sh-lee-prml/HierSpeechpp
The official implementation of HierSpeech++
huggingface/dataspeech
lucidrains/naturalspeech2-pytorch
Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch
scutcsq/Neural-Transducers-for-Two-Stage-Text-to-Speech-via-Semantic-Token-Prediction
Unofficial pytorch reproduction for the paper "Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction" (arXiv:2401.01498)
KdaiP/StableTTS
Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3
liguodongiot/llm-action
本项目旨在分享大模型相关技术原理以及实战经验。
lucidrains/spear-tts-pytorch
Implementation of Spear-TTS - multi-speaker text-to-speech attention network, in Pytorch
huggingface/distil-whisper
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
cpdu/vallt
suno-ai/bark
🔊 Text-Prompted Generative Audio Model
PolyAI-LDN/pheme
Jackiexiao/tts-frontend-dataset
TTS FrontEnd DataSet: Polyphone / Prosody / TextNormalization