JunjunCui

JunjunCui's Stars

liuhuang31/Megatts2_HierSpeechpp
Megatts2 use HierSpeechpp's vocoder
Language:Python181
hertz-pj/SNAC-Vocos
A trainer for SNAC (Multi-Scale Neural Audio Codec) has replaced the decoder with Vocos.
Language:Python375
facebookresearch/audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
Language:Jupyter Notebook21.7k2.3k
THUDM/GLM-4-Voice
GLM-4-Voice | 端到端中英语音对话模型
Language:Python2.8k230
gpt-omni/mini-omni
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
Language:Python3.2k278
hubertsiuzdak/snac
Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate
Language:Python52629
younengma/eden-tts
Language:Python83
2noise/ChatTTS
A generative speech model for daily dialogue.
Language:Python35.4k3.8k
hollobit/GenAI_LLM_timeline
ChatGPT, GenerativeAI and LLMs Timeline
95358
lifeiteng/naturalspeech3_facodec
FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3
Language:Python19515
LSimon95/megatts2
Unoffical implementation of Megatts2
Language:Python27937
coqui-ai/TTS
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Language:Python38.9k4.9k
KdaiP/StableTTS
Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3
Language:Python39642
WhisperSpeech/WhisperSpeech
An Open Source text-to-speech system built by inverting Whisper.
Language:Jupyter Notebook4.2k234
jasonppy/VoiceCraft
Zero-Shot Speech Editing and Text-to-Speech in the Wild
Language:Jupyter Notebook8.2k785
metame-ai/awesome-audio-plaza
Daily tracking of awesome audio papers, including music generation, zero-shot tts, asr, audio generation
37717
imdanboy/jets
JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech
Language:Python10912
scutcsq/Neural-Transducers-for-Two-Stage-Text-to-Speech-via-Semantic-Token-Prediction
Unofficial pytorch reproduction for the paper "Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction" (arXiv:2401.01498)
Language:Python614
Render-AI/Voicebox
1
RVC-Boss/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Language:Python43.3k4.8k
fishaudio/Bert-VITS2
vits2 backbone with multilingual-bert
Language:Python8.3k1.2k
sh-lee-prml/HierSpeechpp
The official implementation of HierSpeech++
Language:Python1.2k151
open-mmlab/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
Language:Python8.9k692
facebookresearch/encodec
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
Language:Python3.6k322
lucidrains/voicebox-pytorch
Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch
Language:Python64553
LinkSoul-AI/LLaSM
第一个支持中英文双语语音-文本多模态对话的开源可商用对话模型。便捷的语音输入将大幅改善以文本为输入的大模型的使用体验，同时避免了基于 ASR 解决方案的繁琐流程以及可能引入的错误。
Language:Python55156
0nutation/USLM
Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)
Language:Python13810
ZhangXInFD/soundstorm-speechtokenizer
Implementation of SoundStorm built upon SpeechTokenizer.
Language:Python10814
ZhangXInFD/SpeechTokenizer
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
Language:Python53950
lucidrains/spear-tts-pytorch
Implementation of Spear-TTS - multi-speaker text-to-speech attention network, in Pytorch
Language:Python26819

JunjunCui

JunjunCui's Stars

liuhuang31/Megatts2_HierSpeechpp

hertz-pj/SNAC-Vocos

facebookresearch/audiocraft

THUDM/GLM-4-Voice

gpt-omni/mini-omni

hubertsiuzdak/snac

younengma/eden-tts

2noise/ChatTTS

hollobit/GenAI_LLM_timeline

lifeiteng/naturalspeech3_facodec

LSimon95/megatts2

coqui-ai/TTS

KdaiP/StableTTS

WhisperSpeech/WhisperSpeech

jasonppy/VoiceCraft

metame-ai/awesome-audio-plaza

imdanboy/jets

scutcsq/Neural-Transducers-for-Two-Stage-Text-to-Speech-via-Semantic-Token-Prediction

Render-AI/Voicebox

RVC-Boss/GPT-SoVITS

fishaudio/Bert-VITS2

sh-lee-prml/HierSpeechpp

open-mmlab/Amphion

facebookresearch/encodec

lucidrains/voicebox-pytorch

LinkSoul-AI/LLaSM

0nutation/USLM

ZhangXInFD/soundstorm-speechtokenizer

ZhangXInFD/SpeechTokenizer

lucidrains/spear-tts-pytorch