GuangkeChen

GuangkeChen's Stars

RVC-Boss/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Language:Python38.4k 223 1.4k4.3k
2noise/ChatTTS
A generative speech model for daily dialogue.
Language:Python33.5k 191 5833.6k
SWivid/F5-TTS
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Language:Python8.8k 82 4481.1k
Plachtaa/VALL-E-X
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/
Language:Python7.8k 82 154769
rany2/edge-tts
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
Language:Python6.7k 53 243659
enhuiz/vall-e
An unofficial PyTorch implementation of the audio LM VALL-E
Language:Python3k 88 98418
ictnlp/LLaMA-Omni
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Language:Python2.7k 30 52185
THUDM/GLM-4-Voice
GLM-4-Voice | 端到端中英语音对话模型
Language:Python2.5k 29 125205
lifeiteng/vall-e
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
Language:Python2.1k 49 127322
homebrewltd/ichigo
Local realtime voice AI
Language:Python1.9k 19 6991
gpt-omni/mini-omni2
Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。
Language:Python1.7k 94 55208
Standard-Intelligence/hertz-dev
first base model for full-duplex conversational audio
Language:Python1.7k 19 26110
QwenLM/Qwen2-Audio
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
Language:Python1.4k 33 8993
facebookresearch/spiritlm
Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".
Language:Python859 18 1956
huawei-noah/Speech-Backbones
This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.
Language:Jupyter Notebook567 23 31122
shibing624/parrots
Automatic Speech Recognition(ASR), Text-To-Speech(TTS) engine. 中英语音识别、多角色语音合成，支持多语言，准确率高
Language:Python484 12 2790
poloclub/wizmap
Explore and interpret large embeddings in your browser with interactive visualization! 📍
Language:TypeScript437 6 2229
centerforaisafety/HarmBench
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
Language:Jupyter Notebook386 6 5061
opendilab/CleanS2S
High-quality and streaming Speech-to-Speech interactive agent in a single file. 只用一个文件实现的流式全双工语音交互原型智能体！
Language:Python307 6 2029
ArthurHeitmann/arctic_shift
Making Reddit data accessible to researchers, moderators and everyone else. Interact with the data through large dumps, an API or web interface.
Language:TypeScript301 12 2322
westlake-baichuan-mllm/bc-omni
Baichuan-Omni: Towards Capable Open-source Omni-modal LLM 🌊
256 18 37
haidog-yaqub/EzAudio
High-quality Text-to-Audio Generation with Efficient Diffusion Transformer
Language:Python249 18 59
NVIDIA/audio-flamingo
PyTorch implementation of Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities.
Language:Python216 6 1115
xinchen-ai/Westlake-Omni
Language:Python187 6 1017
GraySwanAI/nanoGCG
A fast + lightweight implementation of the GCG algorithm in PyTorch
Language:Python152 2 1237
neulab/Pangea
This is the repo for the paper "PANGEA: A FULLY OPEN MULTILINGUAL MULTIMODAL LLM FOR 39 LANGUAGES"
Language:Python96 3 77
2DIPW/gpt_sovits_infer_with_emotion
基于中文文本情绪分析自动切换参考音频的 GPT-SoVITS 推理 Demo
Language:Python83 1 18
parrot-tts/Parrot-TTS
Official Code for ParrotTTS
Language:Python4610
cwang621/blsp-emo
BLSP-Emo: Towards Empathetic Large Speech-Language Models
Language:Python41 4 23
solalala-12/Singing-Voice-Conversion
2019/04~2019/09 투빅스 Singing Voice Conversion
Language:Jupyter Notebook40 3 411