GuangkeChen's Stars
RVC-Boss/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
2noise/ChatTTS
A generative speech model for daily dialogue.
SWivid/F5-TTS
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Plachtaa/VALL-E-X
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/
rany2/edge-tts
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
enhuiz/vall-e
An unofficial PyTorch implementation of the audio LM VALL-E
ictnlp/LLaMA-Omni
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
THUDM/GLM-4-Voice
GLM-4-Voice | 端到端中英语音对话模型
lifeiteng/vall-e
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
homebrewltd/ichigo
Local realtime voice AI
gpt-omni/mini-omni2
Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。
Standard-Intelligence/hertz-dev
first base model for full-duplex conversational audio
QwenLM/Qwen2-Audio
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
facebookresearch/spiritlm
Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".
huawei-noah/Speech-Backbones
This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.
shibing624/parrots
Automatic Speech Recognition(ASR), Text-To-Speech(TTS) engine. 中英语音识别、多角色语音合成,支持多语言,准确率高
poloclub/wizmap
Explore and interpret large embeddings in your browser with interactive visualization! 📍
centerforaisafety/HarmBench
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
opendilab/CleanS2S
High-quality and streaming Speech-to-Speech interactive agent in a single file. 只用一个文件实现的流式全双工语音交互原型智能体!
ArthurHeitmann/arctic_shift
Making Reddit data accessible to researchers, moderators and everyone else. Interact with the data through large dumps, an API or web interface.
westlake-baichuan-mllm/bc-omni
Baichuan-Omni: Towards Capable Open-source Omni-modal LLM 🌊
haidog-yaqub/EzAudio
High-quality Text-to-Audio Generation with Efficient Diffusion Transformer
NVIDIA/audio-flamingo
PyTorch implementation of Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities.
xinchen-ai/Westlake-Omni
GraySwanAI/nanoGCG
A fast + lightweight implementation of the GCG algorithm in PyTorch
neulab/Pangea
This is the repo for the paper "PANGEA: A FULLY OPEN MULTILINGUAL MULTIMODAL LLM FOR 39 LANGUAGES"
2DIPW/gpt_sovits_infer_with_emotion
基于中文文本情绪分析自动切换参考音频的 GPT-SoVITS 推理 Demo
parrot-tts/Parrot-TTS
Official Code for ParrotTTS
cwang621/blsp-emo
BLSP-Emo: Towards Empathetic Large Speech-Language Models
solalala-12/Singing-Voice-Conversion
2019/04~2019/09 투빅스 Singing Voice Conversion