zhucq's Stars
ollama/ollama
Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 2, and other large language models.
RVC-Boss/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
2noise/ChatTTS
A generative speech model for daily dialogue.
facebookresearch/fairseq
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
fishaudio/fish-speech
SOTA Open Source TTS
FunAudioLLM/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
open-mmlab/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
Vaibhavs10/insanely-fast-whisper
kyutai-labs/moshi
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
QwenLM/Qwen-VL
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
k2-fsa/sherpa-onnx
Speech-to-text, text-to-speech, speaker diarization, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter, Object Pascal, Lazarus, Rust
pipecat-ai/pipecat
Open Source framework for voice and multimodal conversational AI
espeak-ng/espeak-ng
eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.
FunAudioLLM/SenseVoice
Multilingual Voice Understanding Model
huggingface/speech-to-speech
Speech To Speech: an effort for an open-sourced and modular GPT4-o
WLiK/LLM4Rec-Awesome-Papers
A list of awesome papers and resources of recommender system on large language model (LLM).
QwenLM/Qwen2-Audio
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
RasaHQ/rasa-demo
:tiger: Sara - the Rasa Demo Bot: An example of a contextual AI assistant built with the open source Rasa Stack
Tele-AI/TeleSpeech-ASR
git-disl/awesome-LLM-game-agent-papers
A Survey on Large Language Model-Based Game Agents
auspicious3000/contentvec
speech self-supervised representations
rtvi-ai/rtvi-web-demo
Example UI implementing the RTVI web client
AdolfVonKleist/Phonetisaurus
Phonetisaurus G2P
nnaisense/bayesian-flow-networks
This is the official code release for Bayesian Flow Networks.
sign-language-translator/sign-language-translator
Python library & framework to build custom translators for the hearing-impaired and translate between Sign Language & Text using Artificial Intelligence.
Plachtaa/FAcodec
Training code for FAcodec presented in NaturalSpeech3
lifeiteng/naturalspeech3_facodec
FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3
isi-nlp/uroman
Universal Romanizer that can convert any unicode script to roman (latin) script
microsoft/reliableAI
rgu-iit-bt/cbr-for-legal-rag