JunjunCui's Stars
liuhuang31/Megatts2_HierSpeechpp
Megatts2 use HierSpeechpp's vocoder
hertz-pj/SNAC-Vocos
A trainer for SNAC (Multi-Scale Neural Audio Codec) has replaced the decoder with Vocos.
facebookresearch/audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
THUDM/GLM-4-Voice
GLM-4-Voice | 端到端中英语音对话模型
gpt-omni/mini-omni
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
hubertsiuzdak/snac
Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate
younengma/eden-tts
2noise/ChatTTS
A generative speech model for daily dialogue.
hollobit/GenAI_LLM_timeline
ChatGPT, GenerativeAI and LLMs Timeline
lifeiteng/naturalspeech3_facodec
FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3
LSimon95/megatts2
Unoffical implementation of Megatts2
coqui-ai/TTS
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
KdaiP/StableTTS
Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3
WhisperSpeech/WhisperSpeech
An Open Source text-to-speech system built by inverting Whisper.
jasonppy/VoiceCraft
Zero-Shot Speech Editing and Text-to-Speech in the Wild
metame-ai/awesome-audio-plaza
Daily tracking of awesome audio papers, including music generation, zero-shot tts, asr, audio generation
imdanboy/jets
JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech
scutcsq/Neural-Transducers-for-Two-Stage-Text-to-Speech-via-Semantic-Token-Prediction
Unofficial pytorch reproduction for the paper "Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction" (arXiv:2401.01498)
Render-AI/Voicebox
RVC-Boss/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
fishaudio/Bert-VITS2
vits2 backbone with multilingual-bert
sh-lee-prml/HierSpeechpp
The official implementation of HierSpeech++
open-mmlab/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
facebookresearch/encodec
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
lucidrains/voicebox-pytorch
Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch
LinkSoul-AI/LLaSM
第一个支持中英文双语语音-文本多模态对话的开源可商用对话模型。便捷的语音输入将大幅改善以文本为输入的大模型的使用体验,同时避免了基于 ASR 解决方案的繁琐流程以及可能引入的错误。
0nutation/USLM
Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)
ZhangXInFD/soundstorm-speechtokenizer
Implementation of SoundStorm built upon SpeechTokenizer.
ZhangXInFD/SpeechTokenizer
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
lucidrains/spear-tts-pytorch
Implementation of Spear-TTS - multi-speaker text-to-speech attention network, in Pytorch