zhucq

zhucq's Stars

ollama/ollama
Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 2, and other large language models.
Language:Go130k 739 6.2k10.7k
RVC-Boss/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Language:Python41.5k 232 1.5k4.6k
2noise/ChatTTS
A generative speech model for daily dialogue.
Language:Python34.8k 192 6133.8k
facebookresearch/fairseq
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Language:Python31.1k 422 4.2k6.5k
fishaudio/fish-speech
SOTA Open Source TTS
Language:Python19.6k 113 5201.5k
FunAudioLLM/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Language:Python11.3k 92 8321.1k
open-mmlab/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
Language:Python8.6k 81 244670
Vaibhavs10/insanely-fast-whisper
Language:Jupyter Notebook8.1k 66 201583
kyutai-labs/moshi
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
Language:Python7.6k 83 109613
QwenLM/Qwen-VL
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
Language:Python5.5k 49 461422
k2-fsa/sherpa-onnx
Speech-to-text, text-to-speech, speaker diarization, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter, Object Pascal, Lazarus, Rust
Language:C++5.1k 73 744568
pipecat-ai/pipecat
Open Source framework for voice and multimodal conversational AI
Language:Python5k 45 318570
espeak-ng/espeak-ng
eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.
Language:C4.7k 104 1k971
FunAudioLLM/SenseVoice
Multilingual Voice Understanding Model
Language:Python4.7k 49 176423
huggingface/speech-to-speech
Speech To Speech: an effort for an open-sourced and modular GPT4-o
Language:Python3.8k 46 95413
WLiK/LLM4Rec-Awesome-Papers
A list of awesome papers and resources of recommender system on large language model (LLM).
1.6k 24 8133
QwenLM/Qwen2-Audio
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
Language:Python1.5k 33 99123
RasaHQ/rasa-demo
:tiger: Sara - the Rasa Demo Bot: An example of a contextual AI assistant built with the open source Rasa Stack
Language:Python971 55 461801
Tele-AI/TeleSpeech-ASR
Language:Python638 17 5860
git-disl/awesome-LLM-game-agent-papers
A Survey on Large Language Model-Based Game Agents
482 11 319
auspicious3000/contentvec
speech self-supervised representations
Language:Python481 11 3239
rtvi-ai/rtvi-web-demo
Example UI implementing the RTVI web client
Language:TypeScript471 9 1168
AdolfVonKleist/Phonetisaurus
Phonetisaurus G2P
Language:Shell462 36 55121
nnaisense/bayesian-flow-networks
This is the official code release for Bayesian Flow Networks.
Language:Python267 12 629
sign-language-translator/sign-language-translator
Python library & framework to build custom translators for the hearing-impaired and translate between Sign Language & Text using Artificial Intelligence.
Language:Python205 9 1436
Plachtaa/FAcodec
Training code for FAcodec presented in NaturalSpeech3
Language:Python195 10 2822
lifeiteng/naturalspeech3_facodec
FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3
Language:Python190 6 815
isi-nlp/uroman
Universal Romanizer that can convert any unicode script to roman (latin) script
Language:Perl180 13 1515
microsoft/reliableAI
Language:Python42 4 27
rgu-iit-bt/cbr-for-legal-rag
Language:Jupyter Notebook13 0 03